DRBD does not promote resource on slave node after "pcs cluster stop --all"

Question

Score:0

Server

DRBD does not promote resource on slave node after "pcs cluster stop --all"

Jose

9/3/23, 1:10 AM

I am trying to understand the recovering process of a promotable resource after "pcs cluster stop --all" and shutdown of both nodes. I have a two nodes + qdevice quorum with a DRBD resource.

This is a summary of the resources before my test. Everything is working just fine and server2 is the master of DRBD.

 * fence-server1    (stonith:fence_vmware_rest):     Started server2
 * fence-server2    (stonith:fence_vmware_rest):     Started server1
 * Clone Set: DRBDData-clone [DRBDData] (promotable):
   * Masters: [ server2 ]
   * Slaves: [ server1 ]
 * Resource Group: nfs:
   * drbd_fs    (ocf::heartbeat:Filesystem):     Started server2

then I issue "pcs cluster stop --all". The cluster will be stopped on both nodes as expected. Now I restart server1( previously the slave ) and poweroff server2 ( previously the master ). When server1 restarts it will fence server2 and I can see that server2 is starting on vcenter, but I just pressed any key on grub to make sure the server2 would not restart, instead it would just be "paused" on grub screen.

SSH'ing to server1 and running pcs status I get:

Cluster name: cluster1
Cluster Summary:
  * Stack: corosync
  * Current DC: server1 (version 2.1.0-8.el8-7c3f660707) - partition with quorum
  * Last updated: Mon May  2 09:52:03 2022
  * Last change:  Mon May  2 09:39:22 2022 by root via cibadmin on server1
  * 2 nodes configured
  * 11 resource instances configured

Node List:
  * Online: [ server1 ]
  * OFFLINE: [ server2 ]

Full List of Resources:
  * fence-server1    (stonith:fence_vmware_rest):     Stopped
  * fence-server2    (stonith:fence_vmware_rest):     Started server1
  * Clone Set: DRBDData-clone [DRBDData] (promotable):
    * Slaves: [ server1 ]
    * Stopped: [ server2 ]
  * Resource Group: nfs:
    * drbd_fs    (ocf::heartbeat:Filesystem):     Stopped

Here are the contraints:

# pcs constraint
Location Constraints:
  Resource: fence-server1
    Disabled on:
      Node: server1 (score:-INFINITY)
  Resource: fence-server2
    Disabled on:
      Node: server2 (score:-INFINITY)
Ordering Constraints:
  promote DRBDData-clone then start nfs (kind:Mandatory)
Colocation Constraints:
  nfs with DRBDData-clone (score:INFINITY) (rsc-role:Started)
(with-rsc-role:Master)
Ticket Constraints:

# sudo crm_mon -1A
...
Node Attributes:
  * Node: server2:
    * master-DRBDData                     : 10000

So I can see there is quorum, but the server1 is never promoted as DRBD master, so the remaining resources will be stopped until server2 is back.

What do I need to do to force the promotion and recover without restarting server2?
Why if instead of rebooting server1 and power off server2 I reboot server2 and poweroff server1 the cluster can recover by itself?
Does that mean that for some reason during the "cluster stop --all" the drbd data got out of sync?

34

0 + 0

redhat

drbd

pacemaker

DRBD does not promote resource on slave node after "pcs cluster stop --all"

Post an answer