Add a server to an already running DRBD9 configuration

Question

Score:3

Server

Add a server to an already running DRBD9 configuration

Matt

3/3/23, 4:09 PM

I am on CentOS7 and I have an existing two-cluster HA node running pacemaker (1.1.23-1.el7_9.1) and DRBD (kmod-drbd90-9.0.22-3.el7_9). The backing partition for the DRBD drive is LUKS encrypted. We are adding a third server to the stack, but after updating the configuration, the drbd drive on the new server does not connect.

Status

The current status, as shown by the new box is:

[root@svr3]# drbdadm status
drbd0 role:Secondary
  disk:Inconsistent
  svr1 connection:Connecting
  svr2 connection:Connecting

From the primary, the status shows as:

[root@svr1]# drbdadm status
drbd0 role: Primary
  disk:UpToDate
  svr2 role:Secondary
    peer-disk:UpToDate
  svr3 connection:StandAlone

DRBD Configuration

The current configuration for the drbd0 resource is:

resource drbd0 {
  protocol C;
  device /dev/drbd0;
  disk /dev/sdb1;
  meta-disk internal;
  on svr1 {
    address 10.10.11.1:7789;
    node-id 1;
  }
  on svr2 {
    address 10.10.11.2:7789;
    node-id 2;
  }
  on svr3 {
    address 10.10.11.3:7789;
    node-id 3;
  }
  connection-mesh {
    hosts svr1 svr2 svr3;
  }
}

Prior to adding svr3, the configuration on svr1 and svr2 was as follows:

resource drbd0 {
  protocol C;
  device /dev/drbd0;
  disk /dev/sdb1;
  meta-disk internal;
  on svr1 {
    address 10.10.11.1:7789;
    node-id 1;
  }
  on svr2 {
    address 10.10.11.2:7789;
    node-id 2;
  }
  connection-mesh {
    hosts svr1 svr2;
  }
}

The DRBD disks were created with the following script on all boxes:

drbdadm create-md --force drbd0
drbdadm up drbd0

On the primary only, the following was also run to set up the disk:

dd if=/dev/zero of=/dev/sdb1 bs=128M count=10
drbdadm primary --force drbd0

cryptsetup -q --keyfile /path/to/keyfile luksFormat /dev/drbd0
cryptsetup --key-file /path/to/keyfile luksOpen /dev/drbd0 luks-drbd

mkfs.ext4 /dev/mapper/luks-drbd

Pacemaker Configuration

The DRBD resources in pacemaker were configured with the following script. The PCS resource hasn't changed, as it was originally set up to allow for a future third node.

pcs resource create drbd0_data ocf:linbit:drbd drbd_resource=drbd0
pcs resource master drbd0_clone drbd0_data \
  master-max=1 master-node-max=1 clone-max=3 clone-node-max=1 \
  notify=true
pcs resource create drbd0_luks ocf:vendor:luks \
  --group=drbd_resources
pcs resource create drbd0_fs ocf:heartbeat:Filesystem \
  device=/dev/mapper/luks-drbd directory=/mnt/data fstype=ext4 \
  --group=drbd_resources

pcs constraint order promote drbd0_data then start drbd_resources
pcs constraint colocation add drbd_resources \
  with drbd0_clone INFINTITY with-rsc-role=Master
pcs constraint order drbd0_luks then drbd0_fs

(The drbd0_luks resource is a custom resource we provide that basically runs cryptsetup luksOpen|luksClose on the LUKS partition as appropriate).

The pacemaker status shows the following:

Online: [ svr1 svr2 svr3 ]

Active resources:

 Master/Slave Set: drbd0_clone [drbd0_data]
     Masters: [ svr1 ]
     Slaves: [ svr2 svr3 ]
 Resource Group: drbd_resources
     drbd0_luks (ocf::vendor::luks):     Started svr1
     drbd0_fs   (ocf::heartbeat::Filesystem):    Started svr1

Attempts to Connect

I've tried various iterations of the following process:

[root@svr1]# drbdadm disconnect drbd0

[root@svr2]# drbdadm disconnect drbd0

[root@svr3]# drbdadm disconnect drbd0
[root@svr3]# drbdadm connect --discard-my-data drbd0

[root@svr1]# drbdadm connect drbd0
drbd0: Failure: (162) Invalid configuration request
Command 'drbdsetup connect drbd0 3' terminated with exit code 10

[root@svr2]# drbdadm connect drbd0
drbd0: Failure: (162) Invalid configuration request
Command 'drbdsetup connect drbd0 3' terminated with exit code 10

After this, the output of drbdadm status is as shown at the top of the post. I get the same error if I attempt to run drbdadm adjust drbd0 on svr1 or svr2.

If I attempt to run drbdadm down drbd0 while the drbd0_luks resource is enabled, I get the following:

[root@svr1]# drbdadm down drbd0
drbd0: State change failed: (-12) Device is held open by someone
additional info from kernel:
/dev/drbd0 opened by cryptsetup (pid 11777) at 2021-11-01 16:50:51
Command 'drbdsetup down drbd0' terminated with exit code 11

If I disabled the drbd0_luks resource, I can run drbdadm down drbd0, but the adjust command fails with the following:

[root@svr1]# drbdadm adjust drbd0
0: Failure: (162) Invalid configuration request
Command 'drbdsetup attach 0 /dev/sdb1 /dev/sdb1 internal' terminated with exit code 10

so I am assuming that I at least need that much up and running. At this point I'm just grasping for straws, but I'm not quite sure what the next correct straw is to reach for.

145

0 + 0

drbd

pacemaker

centos7

Add a server to an already running DRBD9 configuration

Status

DRBD Configuration

Pacemaker Configuration

Attempts to Connect

Post an answer