Score:1

Kubernetes can't mount NFS volumes after NFS server update and reboot

pk flag

After zypper patch'ing NFS server on openSUSE Leap 15.2 to latest version and rebooting, nodes in kubernetes cluster (Openshift 4.5) can no longer mount NFS volumes.

NFS server version: nfs-kernel-server-2.1.1-lp152.9.12.1.x86_64

/etc/exports contains:

/nfs 192.168.11.*(rw,sync,no_wdelay,root_squash,insecure,no_subtree_check,fsid=0)

Affected pods are in ContainerCreating status

kubectl describe pod/<pod_name> gives a following error:

Warning  FailedMount  31m   kubelet            MountVolume.SetUp failed for volume "volume" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/c86dee2e-f533-43c9-9a1d-c4f00a1b8eef/volumes/kubernetes.io~nfs/smart-services-http-video-stream --scope -- mount -t nfs nfs.example.invalid:/nfs/volume /var/lib/kubelet/pods/c86dee2e-f533-43c9-9a1d-c4f00a1b8eef/volumes/kubernetes.io~nfs/pv-name
Output: Running scope as unit: run-r83d4e7dba1b645aca1e4693a48f45191.scope
mount.nfs: Operation not permitted

Server is running NFSv4 only, so rpcbind is turned off and showmount commands are not working.

Mounting directly on kubernetes node results in following error:

sudo mount.nfs4 nfs.example.invalid:/core tmp/ -v; echo $?
mount.nfs4: timeout set for Wed Jul 21 12:16:49 2021
mount.nfs4: trying text-based options 'vers=4.2,addr=192.168.11.2,clientaddr=192.168.11.3'
mount.nfs4: mount(2): Operation not permitted
mount.nfs4: Operation not permitted
32

firewalld rules on NFS server:

  services: ssh dhcpv6-client nfs mountd rpc-bind samba http tftp
  ports: 2049/tcp 2049/udp

AppArmor was working, turning it off haven't changed the outcome.

Before updating NFS server, everything was working fine and no other configuration changes were made. How can i debug this further and make shares mountable again?

Score:3
pk flag

After trying to debug this issue with rpcdebug to no avail, i've resorted to dumping traffic on nfs server coming from one of the nodes. This dump gave an interesting lead:

NFS reply xid 4168498669 reply ERR 20: Auth Bogus Credentials (seal broken)

So the issue was certainly not related to network or apparmor.

Then i've tried to change exports to

/nfs *(rw,sync,no_wdelay,root_squash,insecure,no_subtree_check,fsid=0)

and everything worked, confirming that this issue lies in some sort of exports misconfiguration.

Rewriting rule to

/nfs 192.168.11.0/24(rw,sync,no_wdelay,root_squash,insecure,no_subtree_check,fsid=0)

restored connectivity.

According to https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/deployment_guide/s1-nfs-server-config-exports

wildcards — Where a * or ? character is used to take into account a grouping of fully qualified domain names that match a particular string of letters. Wildcards should not be used with IP addresses; however, it is possible for them to work accidentally if reverse DNS lookups fail.

So using * with IP address was a clear misconfiguration that somehow worked for months, and finally resulted in errors described in question.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.