Score:6

DRBD Cluster nodes not configured (StandAlone)

US flag

I have a HA cluster with two nodes, node one is the primary and node 2 is its mirror. I have a problem in the mysql resource since my nodes are not synchronized

drbd-overview

Node Principal:
0:home Connected Primary/Secondary UpToDate/UpToDate C r-----
1:storage Connected Secondary/Primary UpToDate/UpToDate C r-----
2:mysql StandAlone Secondary/Unknown UpToDate/Outdated r-----

Node Secundary:
0:home Connected Secondary/Primary UpToDate/UpToDate C r-----
1:storage Connected Primary/Secondary UpToDate/UpToDate C r-----
2:mysql StandAlone Primary/Unknown UpToDate/Outdated r-----

Reviewing the messages file I found the following

Apr-19 18:20:36 clsstd2 kernel: block drbd2:self C1480E287A8CAFAB:C7B94724E2658B94:5CAE57DEB3EDC4EE:F5887A918B55FB1A bits:114390101 flags:0
Apr-19 18:20:36 clsstd2 kernel: block drbd2:peer 719D326BDE8272E2:0000000000000000:C7BA4724E2658B94:C7B94724E2658B95 bits:0 flags:1 
                                                         
Apr-19 18:20:36 clsstd2 kernel: block drbd2:uuid_compare()=-1000 by rule 100                           
Apr-19 18:20:37 clsstd2 kernel: block drbd2:Unrelated data, aborting!
Apr-19 18:20:37 clsstd2 kernel: block drbd2:conn (WFReportParams -> Disconnecting)
Apr-19 18:20:37 clsstd2 kernel: block drbd2:error receiving ReportState, l: 4!
Apr-19 18:20:38 clsstd2 kernel: block drbd2:asender terminated
Apr-19 18:20:38 clsstd2 kernel: block drbd2:Terminating asender thread
Apr-19 18:20:38 clsstd2 kernel: block drbd2:Connection closed
Apr-19 18:20:38 clsstd2 kernel: block drbd2:conn (Disconnecting -> StandAlone)
Apr-19 18:20:39 clsstd2 kernel: block drbd2:reciver terminated
Apr-19 18:20:39 clsstd2 kernel: block drbd2:Terminating reciver thread
Apr-19 18:20:39 clsstd2 auditd[3960]: Audit daemon rotating log files

I don't understand what the problem is and how I can solve it, since checking both nodes I realized that in the var/lib/mysql directory I don't have the ibdata1 file in node 2 but it does exist in node1

Score:5
kz flag

The problem is you caught DRBD split brain condition and both nodes went to “StandAlone” state. It’s difficult to say do your have valid or corrupted DB on your primary node, but for now you have two routes to chose from:

(1) Try to resync the nodes assigning one of them as having more recent version of the data (not necessary your case).

(This is what you run on the second node…)

#drbdadm secondary resource 
#drbdadm disconnect resource
#drbdadm -- --discard-my-data connect resource

(This is what you run on your alive node, one you think having the most recent version of the data…)

#drbdadm connect resource

If it won’t help you can trash second node and imitate automatic rebuild doing…

#drbdadm invalidate resource

(2) Purge both nodes data with the last command from (1) and recover your DB from backups.

Hope this helps!

P.S. I would really recommend avoiding DRBD in production. What your see is a quite common thing, unfortunately.

Strepsils avatar
cn flag
Right, this is a split brain in DRBD and possibly, there is a following message in the logs: "kernel: block drbd0: Split-Brain detected, dropping connection!" (although it's not always detected). Route 1 is worth trying. Just an example to illustrate: https://www.suse.com/support/kb/doc/?id=000019009. And you're right, DRBD is well-known for this issue. To avoid it, either use Quorum with a third node or go for something that works properly on 2 nodes like StarWind vSAN for example.
Score:1
US flag

Thank you, indeed, the solution was to create the metadata again, run the following commands on the node where I want to recreate the metadata and now everything is synchronized again.

drbdadm down resource    
drbdadm wipe-md resource    
drbdadm create-md resource    
drbdadm up resource    
drbdadm disconnect resource    
drbdadm connect resource

the last command is executed first on the node where the metadata is recreated and then on the other node.

por ultimo, se ejecuta el comando #cat/proc/drbd y se puede ver la secuencia de sincronización.

Score:1
jm flag
Dok

The issue here is the Unrelated data, aborting! you see within the logs. Likely the nodes have changed roles enough times, while disconnected, that the historical generation identified within the meta-data no longer match. See the DRBD User's Guide here for further information: https://linbit.com/drbd-user-guide/drbd-guide-9_0-en/#s-gi

At this point, you will need to select a node to overwrite the data of the other and perform a new full sync. To do this you should recreate the meta-data on the node to be the SyncTarget. You can do this with a drbdadm create-md <resource>

Iván Jf avatar
md
Thank you for answering, when performing these steps, the data of the main node is not at risk?
jm flag
Dok
As long as you do not recreate the metadata on the primary node, it will automatically be chosen at the SyncSource once they connect.
Iván Jf avatar
md
Thanks, you were right, the solution was to recreate the metadata
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.