drbd: quorum and dual primary for ocfs2 without fencing?

aep

3/12/24, 4:33 PM

Due to cephs poor performance on NVME i'm experimenting with OCFS2 on DBRD again.

DBRD appears to initially have been built around the idea notifying an application of a hardware failure, and the application taking appropriate steps to move access to a new primary. This is not a useful pattern for our case.

DRBD 9 mentions a way based on quorums:

https://linbit.com/drbd-user-guide/drbd-guide-9_0-en/#s-configuring-quorum

if i understand correctly, loss of quorum (i.e. current node is in the minority partition) results in IO being frozen until quorum is reestablished. This is exciting news, as we have good experience with ceph recovering this way without intervention.

the DRBD manual sadly then mentions in section 12

https://linbit.com/drbd-user-guide/drbd-guide-9_0-en/#ch-ocfs2

All cluster file systems require fencing – not only through the DRBD resource, but STONITH! A faulty member must be killed.

which is the opposite of what it says earlier in quorum docs.

i'm not sure why it would require fencing anyway if quorum is supposed to already prevent the minority partition from executing write I/O. Is this because quorum does not actually work with dual-primary?

135

1 + 0

linux

drbd

ocfs2

Score:3

Server

Dok

3/13/24, 3:15 PM

Dual primary is limited to only two nodes. See this warning in the DRBD user's guide.

In DRBD 9.0.x Dual-Primary mode is limited to exactly 2 Primaries for the use in live migration.

Quorum requires 3 nodes or more. Thus you cannot use quorum in conjunction with dual-primary mode.

+ 3

aep

3/14/24, 4:31 PM

hmm. i was assuming a quorum member does not need to be primary. in fact it explicitly mentions using diskless members

Dok

3/14/24, 10:24 PM

Perhaps the warning in the documentation could be worded a bit more clearly. The key word here is _exactly_ With dual-primary you a limited to _exactly_ two primaries. Not two primaries and a third diskless node. With a single Primary in an active-passive configuration, quorum is a perfectly valid solution to prevent data divergence and split-brains. With dual-primary and cluster aware file systems you do require STONITH. Do you need concurrent access to the DRBD devices on both nodes?

aep

3/15/24, 10:26 AM

I was hoping to replace ceph, which is quorum based AND allows write from all nodes. If i understand correctly, if you only have one primary, you need another system on top to direct IO to that primary. i suppose that's what stonith partially solves. Anyway, i see that DRBD simply doesnt match our use case. thanks alot for the clear answer

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: drbd: quorum and dual primary for ocfs2 without fencing?

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.