Score:0

NFS4 background client wont start on reboot and connection timeout

in flag

The Sort Version

A NFS4 mount in fstab normally starts a background service to keep retrying when the mount host is not available. However only during a reboot and on a connection timeout does the background service fail to start. If connection is refused on a reboot, or any condition after a reboot, the background service starts.

The Long Version

This is on Oracle8 which is a RHEL8 / CentOS8 flavor. In /etc/fstab i have

[xx:xx:xx:xx:xx:xx:xx:xx]:/example  /mnt/example  nfs4  defaults  0  0

and in /etc/nfsmount.conf i have

[ NFSMount_Global_Options ]
defaultvers=4
nfsvers=4
background=true
rw=true
hard=true
sync=true
rsize=32k
wsize=32k
nordirplus=true
actimeo=3
_netdev=true

I confirmed in /proc/mounts that the global settings are being applied to the mount when it does work. When the background service starts as expected you can see it running.

[root@01b1 /]# ps aufx | grep nfs
root 1077  0.0  0.0      0     0 ?        I<   18:31   0:00  \_ [nfsiod]
root 1506  0.0  0.1 221928  1036 pts/0    S+   19:16   0:00              \_ grep --color=auto nfs
root 1466  0.0  0.0  47812   588 ?        Ss   18:36   0:00 /sbin/mount.nfs4 [xx:xx:xx:xx:xx:xx:xx:xx]:/example /mnt/example -o rw

And to be clear, when the host server is available the mount works every time. The following is when the host isn't available such as its rebooting, port is blocked, nfs-server is stopped, etc.

When the background service starts on reboots it shows this in /var/log/messages

Jun  8 17:42:29 01b1 systemd[1]: mnt-example.mount: Directory /mnt/example to mount over is not empty, mounting anyway.
Jun  8 17:42:29 01b1 systemd[1]: Mounting /mnt/example...
Jun  8 17:42:31 01b1 mount[1027]: mount to NFS server 'xx:xx:xx:xx:xx:xx:xx:xx' failed: Connection refused, retrying
Jun  8 17:42:31 01b1 mount[1018]: mount.nfs4: backgrounding "[xx:xx:xx:xx:xx:xx:xx:xx]:/example"
Jun  8 17:42:31 01b1 mount[1018]: mount.nfs4: mount options: "rw,vers=4,bg,rw,hard,sync,rsize=32768,wsize=32768,nordirplus,actimeo=3,_netdev"
Jun  8 17:42:31 01b1 systemd[1]: mnt-example.mount: Mount process finished, but there is no mount.
Jun  8 17:42:31 01b1 systemd[1]: mnt-example.mount: Failed with result 'protocol'.
Jun  8 17:42:31 01b1 systemd[1]: Failed to mount /mnt/example.
Jun  8 17:42:32 01b1 mount[1128]: mount to NFS server 'xx:xx:xx:xx:xx:xx:xx:xx' failed: Connection refused, retrying
Jun  8 17:42:34 01b1 mount[1128]: mount to NFS server 'xx:xx:xx:xx:xx:xx:xx:xx' failed: Connection refused, retrying

But when it fails to start the background service on reboot it shows this

Jun  8 17:49:05 01b1 systemd[1]: mnt-example.mount: Directory /mnt/example to mount over is not empty, mounting anyway.
Jun  8 17:49:05 01b1 systemd[1]: Mounting /mnt/example...
Jun  8 17:50:35 01b1 systemd[1]: mnt-example.mount: Mounting timed out. Terminating.
Jun  8 17:50:35 01b1 systemd[1]: mnt-example.mount: Mount process exited, code=killed status=15
Jun  8 17:50:35 01b1 systemd[1]: mnt-example.mount: Failed with result 'timeout'.
Jun  8 17:50:35 01b1 systemd[1]: Failed to mount /mnt/example.

On reboots when the host port is open, but the nfs-server isn't running, the connection is refused, and the background service is started.

On reboots when the host is off or the port is blocked, the connection times out, and the background service is not started.

If host isn't available, client isn't rebooting, and i manually mount -a, the background service will always start even if the connection is refused or timed out.

Any idea why the background service fails to start only on rebooting and a timed out connection? Any way to fix it so the background will always start when host isn't available?

--- UPDATE ---

Ive been trying random settings to see if something would help. I found that adjusting the timeout/retry settings allowed the background service to startup at reboot on both timeout and refused connections.

/etc/nfsmount.conf

# Default timeo=600
# Default retrans=2
timeo=20
retrans=4

However i feel like this is a band-aid and not a solution. This doesn't make any sense to me unless its a RHEL bug. Im guessing some kind of race issue is happening on the longer timeout. I don't trust this because what if those race conditions change and it breaks again.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.