Score:4

Cannot write files into highly available NFS storage created with DRBD and Pacemake. (Permission denied error returned)

sh flag

I am trying to set up a highly available NFS storage with DRBD and Pacemake (first time doing this), on 2 Fedora 38 VMs.

My main guidance on this endeavor were these 2 docs: doc1 doc2

I've managed to start the pacemaker cluster and to mount the NFS shared folder on my hosts, but when I try to write something in that folder, I get a prmission denied error.

Changing the mount point permission to 666 or 777 doesn't help.

Any idea what could be wrong ?

My DRBD configs looks like this:

#> sudo vi /etc/drbd.d/global_common.conf 
global {
 usage-count  yes;
}
common {
 disk {
    no-disk-flushes;
    no-disk-barrier;
    c-fill-target 24M;
    c-max-rate   720M;
    c-plan-ahead    15;
    c-min-rate     4M;
  }
  net {
    protocol C;
    max-buffers            36k;
    sndbuf-size            1024k;
    rcvbuf-size            2048k;
  }
}

#> sudo vi /etc/drbd.d/ha_nfs.res

resource ha_nfs {
  device "/dev/drbd1003";
  disk "/dev/nfs/share";
  meta-disk internal;
  on server1.test {
    address 192.168.1.116:7789;
  }
  on server2.test {
    address 192.168.1.167:7789;
  }
}

the pacemaker config looks like this:

crm> configure edit
node 1: server1.test
node 2: server2.test
primitive p_drbd_attr ocf:linbit:drbd-attr
primitive p_drbd_ha_nfs ocf:linbit:drbd \
        params drbd_resource=ha_nfs \
        op monitor timeout=20s interval=21s role=Slave start-delay=12s \
        op monitor timeout=20s interval=20s role=Master start-delay=8s
primitive p_expfs_nfsshare_exports_HA exportfs \
        params clientspec="192.168.1.0/24" directory="/nfsshare/exports/HA" fsid=1003 unlock_on_stop=1 options="rw,mountpoint" \
        op monitor interval=15s timeout=40s start-delay=15s \
        op_params OCF_CHECK_LEVEL=0 \
        op start interval=0s timeout=40s \
        op stop interval=0s timeout=120s
primitive p_fs_nfsshare_exports_HA Filesystem \
        params device="/dev/drbd1003" directory="/nfsshare/exports/HA" fstype=ext4 run_fsck=no \
        op monitor interval=15s timeout=40s start-delay=15s \
        op_params OCF_CHECK_LEVEL=0 \
        op start interval=0s timeout=60s \
        op stop interval=0s timeout=60s
primitive p_nfsserver nfsserver
primitive p_pb_block portblock \
        params action=block ip=192.168.1.101 portno=2049 protocol=tcp
primitive p_pb_unblock portblock \
        params action=unblock ip=192.168.1.101 portno=2049 tickle_dir="/srv/drbd-nfs/nfstest/.tickle" reset_local_on_unblock_stop=1 protocol=tcp \
        op monitor interval=10s timeout=20s start-delay=15s
primitive p_virtip IPaddr2 \
        params ip=192.168.1.101 cidr_netmask=32 \
        op monitor interval=1s timeout=40s start-delay=0s \
        op start interval=0s timeout=20s \
        op stop interval=0s timeout=20s
ms ms_drbd_ha_nfs p_drbd_ha_nfs \
        meta master-max=1 master-node-max=1 clone-node-max=1 clone-max=2 notify=true
clone c_drbd_attr p_drbd_attr
colocation co_ha_nfs inf: p_pb_block p_virtip ms_drbd_ha_nfs:Master p_fs_nfsshare_exports_HA p_expfs_nfsshare_exports_HA p_nfsserver p_pb_unblock
property cib-bootstrap-options: \
        have-watchdog=false \
        cluster-infrastructure=corosync \
        cluster-name=nfsCluster \
        stonith-enabled=false \
        no-quorum-policy=ignore

PCS sttatus output:

[bebe@server2 share]$ sudo pcs status
[sudo] password for bebe:
Cluster name: nfsCluster
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: server1.test (version 2.1.6-4.fc38-6fdc9deea29) - partition with quorum
  * Last updated: Thu Jul 13 08:50:34 2023 on server2.test
  * Last change:  Thu Jul 13 08:27:46 2023 by hacluster via crmd on server1.test
  * 2 nodes configured
  * 10 resource instances configured

Node List:
  * Online: [ server1.test server2.test ]

Full List of Resources:
  * p_virtip    (ocf::heartbeat:IPaddr2):        Started server2.test
  * p_expfs_nfsshare_exports_HA (ocf::heartbeat:exportfs):       Started server2.test
  * p_fs_nfsshare_exports_HA    (ocf::heartbeat:Filesystem):     Started server2.test
  * p_nfsserver (ocf::heartbeat:nfsserver):      Started server2.test
  * p_pb_block  (ocf::heartbeat:portblock):      Started server2.test
  * p_pb_unblock        (ocf::heartbeat:portblock):      Started server2.test
  * Clone Set: ms_drbd_ha_nfs [p_drbd_ha_nfs] (promotable):
    * Masters: [ server2.test ]
    * Slaves: [ server1.test ]
  * Clone Set: c_drbd_attr [p_drbd_attr]:
    * Started: [ server1.test server2.test ]

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

DRBD status output:

[bebe@server2 share]$ sudo drbdadm status ha_nfs
ha_nfs role:Primary
  disk:UpToDate
  peer role:Secondary
    replication:Established peer-disk:UpToDate
Score:3
gb flag

Sounds like a misconfigured permissions set. To troubleshoot devastate your setup and try to re-create failover NFS mount points from scratch.

P.S. It's a fragile setup in general, active-passive DRBD replication is prone to failover mount/unmount and misconfiguration issues like that. Active-active block-level replication combined with a cluster-aware file system should be used instead.

Yonoss avatar
sh flag
do you have a link handy where I can find the procedures on how to set up the Active-active block-level replication ? I'm only looking for a software solution. Not interested in dedicated hardware for this functionality. Thanks!
Stuka avatar
gb flag
There are a few software options for active-active replication. I would take a look at two options Microsoft S2D and Starwinds VSAN. I wouldn't use S2D on just two nodes, but it's definitely an option. https://learn.microsoft.com/en-us/azure-stack/hci/concepts/storage-spaces-direct-overview https://www.starwindsoftware.com/vsan
RiGiD5 avatar
cn flag
Enable KVM and run StarWind Virtual SAN. S2D requires extra Windows license and is pretty horrible in general.
Matt Kereczman avatar
nr flag
I certainly wouldn't call it "a fragile setup", but you need to configure the cluster correctly. Things like fencing and quorum are often ignored, which will result in the issues Stuka mentioned in their answer. Active-active two node clusters are much more fragile than an active-passive two node clusters with failures that are much worse (like corruption).
Yonoss avatar
sh flag
Yeah, I feel like DRBD + Peacemaker is a bit too complicated to set up and operate, plus I don't see much activity in responding to questions in DRBD github page. So I think I will drop this approach. I will try to achive something similar to what DRBD is doing using Syncthing. Is way easiers to install; Is pear to pear; I can add a lot of nodes; the only drawback that I can see so far is that it is a bit slow to synchronize data across shared folders. But I think I can manage that. Thanks anyone for your support!
BaronSamedi1958 avatar
kz flag
@Matt Kereczman Dude... It's a 100% FUD you feed to people here! All modern two-node replicated solutions built-in into hypervisors, like f.e. VMware Virtual SAN and Microsoft Storage Spaced Direct are active-active by design and experience no issues with data corruption.
Matt Kereczman avatar
nr flag
@BaronSamedi1958 I think we're talking about two different things. In the Linux HA clustering world, active/active can refer to two or more nodes actively accessing the same block device at the same time. In VMware VSAN they seem to use the term active/active to designate synchronous replication (versus asynchronous or active/passive in their terms). You're still only reading from and writing to one node at a time on a per volume basis in a VSAN solution. Not trying to spread fear.
BaronSamedi1958 avatar
kz flag
I’m not sure why you keep spreading this misinformation :( Of course VMware VSAN has virtual volume mounted among aall the cluster nodes and all the nodes participate in I/O. “In traditional vSAN clusters, a virtual machine’s read operations are distributed across all replica copies of the data in the cluster. In the case of a policy setting of NumberOfFailuresToTolerate =1, which results in two copies of the data, 50% of the reads will come from replica1 and 50% will come from replica2.”
BaronSamedi1958 avatar
kz flag
“In the case of a policy setting of NumberOfFailuresToTolerate =2 in non-stretched vSAN clusters, results in three copies of the data, 33% of the reads will come from replica1, 33% of the reads will come from replica2 and 33% will come from replica3.
BaronSamedi1958 avatar
kz flag
Entertain yourself… https://core.vmware.com/resource/vsan-2-node-cluster-guide#section1
BaronSamedi1958 avatar
kz flag
Microsoft is no different. Shared virtual volume managed by S2D and hosting clustered file system (CSVFS) is readable & writable by all the nodes within the cluster. So is shared container files (VHDX), this is how failover cluster works.
BaronSamedi1958 avatar
kz flag
Enjoy… https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-r2-and-2012/dn281956(v=ws.11)
Score:-3
nr flag

My guess is permissions still are not correct, or you set the permissions on the mount point before the server mounted the filesystem.

I would try a recursive chown and chmod on the mount point from the DRBD Primary while the filesystem is mounted. Also, I usually chown the root directory of my NFS exports to nobody:nobody, which might help if you're trying to write to the share from a client system as the root user (since root_squash is a default NFS export option). You could also try setting the option="no_root_squash" param on the exportfs resource just to see if that's what you're up against, but it's generally not something you want to leave enabled for security reasons.

Also, I usually set the options=rw parameter on the exportfs resources, but that might be a default.

Matt Kereczman avatar
nr flag
awesome feedback on the down votes, thanks! /s
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.