Score:2

Getting DRBD to boot up in a synchronized state from cold boot

de flag

I am working in an environment with an embedded High-Availability NVMeoF cluster, and need this cluster to be available within minutes of a cold power-on of all nodes, and trying to set up a RAID-10 on this cluster.

The legacy infrastructure that I am working with is based on GFS2 and LVM2. Unfortunately, the raid-10 option for LVM appears to only allow one journal for GFS2, when I need more. I started working down the path of manually setting up a series of RAID-1 arrays with DRBD over which I could set up a RAID-0 with LVM2. I have been able to set this up without too much trouble. However, at this point, I run into a snag: How do I cleanly shut down and start back up the nodes for seamless data transition?

My initial, basic attempts have resulted in each board booting in a state with synchronization at 0%, and it takes hours for them to resync. I used the following commands on each of four nodes to attempt to shut down cleanly:

vgchange -a n g1 #g1 is the logical volume laid overtop the physical volumes of r0 and r1
vgchange --lockstop

drbdadm down r0 #drbd resource configured as a physical volume
drbdadm down r1 #drbd resource configured as a physical volume

Then each board is power cycled, and I attempt to start back up with the following commands:

drbdadm up r0
drbdadm up r1
if [ `hostname` = "appropriate-host" ]; then drbdadm primary --force r0; fi
if [ `hostname` = "appropriate-host2" ]; then drbdadm primary --force r1; fi
vgchange --lockstart
vgchange -asy g1

I have noticed that sometimes, this just works. Other times, I am told that my metadata is invalid and I will need to recreate it. After running drbdadm create-md r0 or r1, synchronization via drbd occurs from a 0% starting point, which takes hours, which my effort can't sustain. I am unsure if a specific start-up/shut-down sequence might allow me to reliably avoid synchronization concerns; if there is a way to force DRBD to speed up re-synchronization efforts; if swapping to a RAID-01 configuration where DRBD is overlaid on top of two logical volumes might ensure a more reliable starting configuration that can skip initial synchronization; or if I'm using entirely the wrong tools for the job.

Does ServerOverflow have any insight to help me tool up my configuration for reliable and clean start-ups and shut-downs? Any help would be greatly appreciated!

batistuta09 avatar
vn flag
It looks like overcomplicated setup. Have you considered using Starwinds VSAN instead of DRBD to get rid of those resyncs and issues? As far as I know they have a free Linux version of their product that supports RAID and LVM fully and started working towards NVMeoF features, so it is definitely worth talking to them. https://www.starwindsoftware.com/starwind-virtual-san-free
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.