I am on CentOS7 and I have an existing two-cluster HA node running pacemaker (1.1.23-1.el7_9.1) and DRBD (kmod-drbd90-9.0.22-3.el7_9). The backing partition for the DRBD drive is LUKS encrypted. We are adding a third server to the stack, but after updating the configuration, the drbd drive on the new server does not connect.
Status
The current status, as shown by the new box is:
[root@svr3]# drbdadm status
drbd0 role:Secondary
disk:Inconsistent
svr1 connection:Connecting
svr2 connection:Connecting
From the primary, the status shows as:
[root@svr1]# drbdadm status
drbd0 role: Primary
disk:UpToDate
svr2 role:Secondary
peer-disk:UpToDate
svr3 connection:StandAlone
DRBD Configuration
The current configuration for the drbd0
resource is:
resource drbd0 {
protocol C;
device /dev/drbd0;
disk /dev/sdb1;
meta-disk internal;
on svr1 {
address 10.10.11.1:7789;
node-id 1;
}
on svr2 {
address 10.10.11.2:7789;
node-id 2;
}
on svr3 {
address 10.10.11.3:7789;
node-id 3;
}
connection-mesh {
hosts svr1 svr2 svr3;
}
}
Prior to adding svr3, the configuration on svr1 and svr2 was as follows:
resource drbd0 {
protocol C;
device /dev/drbd0;
disk /dev/sdb1;
meta-disk internal;
on svr1 {
address 10.10.11.1:7789;
node-id 1;
}
on svr2 {
address 10.10.11.2:7789;
node-id 2;
}
connection-mesh {
hosts svr1 svr2;
}
}
The DRBD disks were created with the following script on all boxes:
drbdadm create-md --force drbd0
drbdadm up drbd0
On the primary only, the following was also run to set up the disk:
dd if=/dev/zero of=/dev/sdb1 bs=128M count=10
drbdadm primary --force drbd0
cryptsetup -q --keyfile /path/to/keyfile luksFormat /dev/drbd0
cryptsetup --key-file /path/to/keyfile luksOpen /dev/drbd0 luks-drbd
mkfs.ext4 /dev/mapper/luks-drbd
Pacemaker Configuration
The DRBD resources in pacemaker were configured with the following script. The PCS resource hasn't changed, as it was originally set up to allow for a future third node.
pcs resource create drbd0_data ocf:linbit:drbd drbd_resource=drbd0
pcs resource master drbd0_clone drbd0_data \
master-max=1 master-node-max=1 clone-max=3 clone-node-max=1 \
notify=true
pcs resource create drbd0_luks ocf:vendor:luks \
--group=drbd_resources
pcs resource create drbd0_fs ocf:heartbeat:Filesystem \
device=/dev/mapper/luks-drbd directory=/mnt/data fstype=ext4 \
--group=drbd_resources
pcs constraint order promote drbd0_data then start drbd_resources
pcs constraint colocation add drbd_resources \
with drbd0_clone INFINTITY with-rsc-role=Master
pcs constraint order drbd0_luks then drbd0_fs
(The drbd0_luks
resource is a custom resource we provide that basically runs cryptsetup luksOpen|luksClose
on the LUKS partition as appropriate).
The pacemaker status shows the following:
Online: [ svr1 svr2 svr3 ]
Active resources:
Master/Slave Set: drbd0_clone [drbd0_data]
Masters: [ svr1 ]
Slaves: [ svr2 svr3 ]
Resource Group: drbd_resources
drbd0_luks (ocf::vendor::luks): Started svr1
drbd0_fs (ocf::heartbeat::Filesystem): Started svr1
Attempts to Connect
I've tried various iterations of the following process:
[root@svr1]# drbdadm disconnect drbd0
[root@svr2]# drbdadm disconnect drbd0
[root@svr3]# drbdadm disconnect drbd0
[root@svr3]# drbdadm connect --discard-my-data drbd0
[root@svr1]# drbdadm connect drbd0
drbd0: Failure: (162) Invalid configuration request
Command 'drbdsetup connect drbd0 3' terminated with exit code 10
[root@svr2]# drbdadm connect drbd0
drbd0: Failure: (162) Invalid configuration request
Command 'drbdsetup connect drbd0 3' terminated with exit code 10
After this, the output of drbdadm status
is as shown at the top of the post. I get the same error if I attempt to run drbdadm adjust drbd0
on svr1 or svr2.
If I attempt to run drbdadm down drbd0
while the drbd0_luks
resource is enabled, I get the following:
[root@svr1]# drbdadm down drbd0
drbd0: State change failed: (-12) Device is held open by someone
additional info from kernel:
/dev/drbd0 opened by cryptsetup (pid 11777) at 2021-11-01 16:50:51
Command 'drbdsetup down drbd0' terminated with exit code 11
If I disabled the drbd0_luks
resource, I can run drbdadm down drbd0
, but the adjust
command fails with the following:
[root@svr1]# drbdadm adjust drbd0
0: Failure: (162) Invalid configuration request
Command 'drbdsetup attach 0 /dev/sdb1 /dev/sdb1 internal' terminated with exit code 10
so I am assuming that I at least need that much up and running. At this point I'm just grasping for straws, but I'm not quite sure what the next correct straw is to reach for.