Score:0

DRBD does not promote resource on slave node after "pcs cluster stop --all"

vn flag

I am trying to understand the recovering process of a promotable resource after "pcs cluster stop --all" and shutdown of both nodes. I have a two nodes + qdevice quorum with a DRBD resource.

This is a summary of the resources before my test. Everything is working just fine and server2 is the master of DRBD.

 * fence-server1    (stonith:fence_vmware_rest):     Started server2
 * fence-server2    (stonith:fence_vmware_rest):     Started server1
 * Clone Set: DRBDData-clone [DRBDData] (promotable):
   * Masters: [ server2 ]
   * Slaves: [ server1 ]
 * Resource Group: nfs:
   * drbd_fs    (ocf::heartbeat:Filesystem):     Started server2

then I issue "pcs cluster stop --all". The cluster will be stopped on both nodes as expected. Now I restart server1( previously the slave ) and poweroff server2 ( previously the master ). When server1 restarts it will fence server2 and I can see that server2 is starting on vcenter, but I just pressed any key on grub to make sure the server2 would not restart, instead it would just be "paused" on grub screen.

SSH'ing to server1 and running pcs status I get:

Cluster name: cluster1
Cluster Summary:
  * Stack: corosync
  * Current DC: server1 (version 2.1.0-8.el8-7c3f660707) - partition with quorum
  * Last updated: Mon May  2 09:52:03 2022
  * Last change:  Mon May  2 09:39:22 2022 by root via cibadmin on server1
  * 2 nodes configured
  * 11 resource instances configured

Node List:
  * Online: [ server1 ]
  * OFFLINE: [ server2 ]

Full List of Resources:
  * fence-server1    (stonith:fence_vmware_rest):     Stopped
  * fence-server2    (stonith:fence_vmware_rest):     Started server1
  * Clone Set: DRBDData-clone [DRBDData] (promotable):
    * Slaves: [ server1 ]
    * Stopped: [ server2 ]
  * Resource Group: nfs:
    * drbd_fs    (ocf::heartbeat:Filesystem):     Stopped

Here are the contraints:

# pcs constraint
Location Constraints:
  Resource: fence-server1
    Disabled on:
      Node: server1 (score:-INFINITY)
  Resource: fence-server2
    Disabled on:
      Node: server2 (score:-INFINITY)
Ordering Constraints:
  promote DRBDData-clone then start nfs (kind:Mandatory)
Colocation Constraints:
  nfs with DRBDData-clone (score:INFINITY) (rsc-role:Started)
(with-rsc-role:Master)
Ticket Constraints:

# sudo crm_mon -1A
...
Node Attributes:
  * Node: server2:
    * master-DRBDData                     : 10000

So I can see there is quorum, but the server1 is never promoted as DRBD master, so the remaining resources will be stopped until server2 is back.

  1. What do I need to do to force the promotion and recover without restarting server2?
  2. Why if instead of rebooting server1 and power off server2 I reboot server2 and poweroff server1 the cluster can recover by itself?
  3. Does that mean that for some reason during the "cluster stop --all" the drbd data got out of sync?
jm flag
Dok
What is the status of DRBD? Could you also include you DRBD configuration?
Score:0
cn flag

I ran into the exact same issue with my setup since mine is almost a carbon copy of yours and I eventually managed to make it work. (I was testing if there was a power outage and all servers in the cluster turned off and only one storage node came back.)

Not sure of your setup - I have a diskless witness for DRBD with a quorum setting of 1; the witness is also used as a qdevice for the cluster. I checked the status of the DRBD resource on the available node - it was Secondary, with Connecting on the downed node, and Diskless (Connected/Secondary) on the witness node. I checked the status of the cluster quorum and made sure it was quorate.

After that, I made the DRBD resource primary on available node. I eventually figured out if I (temporarily) disable STONITH on the cluster, the DRBD resource and subsequent resources started immediately and in order. After 'fixing' the downed node, I reenabled STONITH and made sure the resources could move around properly.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.