Score:0

Stopping DRBD so I can run some tests with a VM

in flag

We have two servers I inherited, both running DRBD and each then running KVM virtual machines.

I would love to stop a VM running on server1, and bring up just the 1 VM on server2 for some tests. Though with DRBD doing its thing on these servers and the broken startup script (posted here) I have from server2, it makes me nervous as I don't want to stop fully server1, just the one vm on it. I didn't create or configure these machines and I am in doubt weather the DRBD (Which I know little about) was fully properly implemented. Server1's stop script is posted and servers2 start script is posted here to.

But before all that, I guess I just want to know how to stop safely drbd from mucking with the two servers for a time. So that I can mount a file system on server2, and bring up a VM that I stopped on server1.

Server1 site stop script:

echo    poweroff -p now
echo
read -rsp $'Press any key to continue...\n' -n1 key

virsh shutdown irsc
virsh shutdown backup
virsh shutdown user
virsh shutdown repository
virsh shutdown web-firewall
virsh shutdown wiki
virsh shutdown a-gateway
virsh shutdown b-gateway
virsh shutdown dhcp
 
# shutdown the drbd
#drbd-stop
echo now manually turn off drbd
echo     umount /systems
echo     drbdadm secondary all
echo     drbd-overview

Why the drbd-stop is commented out no idea, and why it echos things it should be doing? I have no idea. But okay, so thats the stop script. Server1's img files for the KVM live in /systems btw.

So I goto server 2. First issue: the /systems folder has no img files in it, but there is a mount line in the startup script. Here is the start-script for server2: (I have no idea what the nodedev-detach pci is really doing.)

#!/bin/sh
# isolate the CPUs for the VMs
#site-isolate

# backup 192 network
virsh nodedev-detach pci_0000_06_10_2
# 10.7
virsh nodedev-detach pci_0000_02_10_0
# 10.5
virsh nodedev-detach pci_0000_06_10_3
# 10.2
virsh nodedev-detach pci_0000_02_10_1

# a-gateway
# 192
virsh nodedev-detach pci_0000_06_10_0
# 10.5
virsh nodedev-detach pci_0000_06_10_1
# 10.7
virsh nodedev-detach pci_0000_02_10_4

# b-gateway
# 192
virsh nodedev-detach pci_0000_06_10_4
# 10.2
virsh nodedev-detach pci_0000_02_10_5

# dhcp
# 10.5
virsh nodedev-detach pci_0000_06_10_7
# 10.7
virsh nodedev-detach pci_0000_02_11_0
# 10.2
virsh nodedev-detach pci_0000_02_11_1

# dns2
# 192
virsh nodedev-detach pci_0000_06_11_0

# web-server
# 10.7
virsh nodedev-detach pci_0000_02_11_4

# web-firewall
# 192
virsh nodedev-detach pci_0000_06_10_6
# 10.7
virsh nodedev-detach pci_0000_02_12_4
# 10.2
virsh nodedev-detach pci_0000_02_11_5

# irsc
# 10.7
virsh nodedev-detach pci_0000_02_13_0
# BTTV
virsh nodedev-detach pci_0000_09_00_0

# firewall
# 10.25
virsh nodedev-detach pci_0000_02_12_1
# 10.5
virsh nodedev-detach pci_0000_06_11_1

# bro-server
# 192
virsh nodedev-detach pci_0000_06_11_2

echo start drbd
# start the disk mirror with the slave
service drbd start
sleep 2

# now setup drbd and filesystems

# for all VM images, mount the /systems
drbdadm primary systems
mount /dev/drbd/by-res/systems /systems

# for arc-gateway
drbdadm primary arc-gateway-data

# for backup
drbdadm primary archive
drbdadm primary amanda

# for user computer
# for user computer
drbdadm primary users

# for web server computer
drbdadm primary web-server

# for wiki
drbdadm primary svn

# for irsc. *** this is the one I want to bring up?  do I have to do this drbdadm primary irsc
drbdadm primary irsc

echo start vms
# start the VMs
# fundamental servers
virsh start dns2
virsh start dhcp
# take a long time to start servers
virsh start devel1
virsh start xmail
# gateways, sdss-gateway takes a long time
virsh start sdss-gateway
virsh start arc-gateway
virsh start user
# APO servers
virsh start web-server
virsh start backup
virsh start repository
virsh start wiki
virsh start irsc

# finally web firewall, now online to the world
virsh start web-firewall
jm flag
Dok
What is the backing disk that the irsc VM is configured to use? Is using a raw /dev/drbdX device, or is it using some VM image file hosted on a filesystem?
in flag
In that backing disk, the startup script mounts a folder: mount /dev/drbd/by-res/systems /systems which has the .img files in it. So if I mount it on the backup system I am not sure if drbd will lose its mind.
Score:1
jm flag
Dok

As you explained in an above comment. All the VM's root volumes are stored as image files in the filesystem mounted at /systems. In order to safely fail this over to the peer system you would need to stop access to this filesystem (stop all VMs) and unmount it first. This lumps all the VMs together, and makes it so you would need to failover all VMs.

One option, which is generally not advised would be to disconnect the DRBD nodes and manually cause a split-brain. Essentially both nodes would be primary at the same time, and thus cause data-divergence which you will need to manually resolve to reconnect them. I would first verify your DRBD configuration doesn't include any automatic split-brain recovery options. The procedure should be similar to the below. Use caution here particularly with the --discard-my-data command. Running these from the wrong node could be disastrous.

## From the secondary node
# drbdadm disconnect systems
# drbdadm primary systems
## Verify irsc is stopped on the peer
# virsh start irsc
## Do whatever testing you need
# virsh stop irsc
# drbdadm secondary systems
# drbdacm connect systems --discard-my-data
in flag
So this: mount /dev/drbd/by-res/systems /systems is a shared resource between nodes? Hence I cannot just have it mounted on both nodes and have on purpose shut down the DRBD on both nodes (basically divorcing them from DRBD). I understand your commands generally and do realize how disastrous doing these things with DRBD can be. As not a real sys admin and someone who inherited this system, the drbd system as a whole makes me nervous as can be.
jm flag
Dok
DRBD, in the majority of instances, replicates data between host to their respective local storage. So, it's not "shared" in the classic sense instead both nodes have identical block level copies of the data. Sounds like you removed DRBD for the present time. I suspect you're just accessing the backing storage directly now. Just know that if you ever want to "re-enable" DRBD you'll need to do a full-sync as you've now altered the underlying data without anyway DRBD can be aware of it.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.