NFS4 background client wont start on reboot and connection timeout

Question

Score:0

Server

NFS4 background client wont start on reboot and connection timeout

Private_Citizen

10/8/22, 11:51 PM

The Sort Version

A NFS4 mount in fstab normally starts a background service to keep retrying when the mount host is not available. However only during a reboot and on a connection timeout does the background service fail to start. If connection is refused on a reboot, or any condition after a reboot, the background service starts.

The Long Version

This is on Oracle8 which is a RHEL8 / CentOS8 flavor. In /etc/fstab i have

[xx:xx:xx:xx:xx:xx:xx:xx]:/example  /mnt/example  nfs4  defaults  0  0

and in /etc/nfsmount.conf i have

[ NFSMount_Global_Options ]
defaultvers=4
nfsvers=4
background=true
rw=true
hard=true
sync=true
rsize=32k
wsize=32k
nordirplus=true
actimeo=3
_netdev=true

I confirmed in /proc/mounts that the global settings are being applied to the mount when it does work. When the background service starts as expected you can see it running.

[root@01b1 /]# ps aufx | grep nfs
root 1077  0.0  0.0      0     0 ?        I<   18:31   0:00  \_ [nfsiod]
root 1506  0.0  0.1 221928  1036 pts/0    S+   19:16   0:00              \_ grep --color=auto nfs
root 1466  0.0  0.0  47812   588 ?        Ss   18:36   0:00 /sbin/mount.nfs4 [xx:xx:xx:xx:xx:xx:xx:xx]:/example /mnt/example -o rw

And to be clear, when the host server is available the mount works every time. The following is when the host isn't available such as its rebooting, port is blocked, nfs-server is stopped, etc.

When the background service starts on reboots it shows this in /var/log/messages

Jun  8 17:42:29 01b1 systemd[1]: mnt-example.mount: Directory /mnt/example to mount over is not empty, mounting anyway.
Jun  8 17:42:29 01b1 systemd[1]: Mounting /mnt/example...
Jun  8 17:42:31 01b1 mount[1027]: mount to NFS server 'xx:xx:xx:xx:xx:xx:xx:xx' failed: Connection refused, retrying
Jun  8 17:42:31 01b1 mount[1018]: mount.nfs4: backgrounding "[xx:xx:xx:xx:xx:xx:xx:xx]:/example"
Jun  8 17:42:31 01b1 mount[1018]: mount.nfs4: mount options: "rw,vers=4,bg,rw,hard,sync,rsize=32768,wsize=32768,nordirplus,actimeo=3,_netdev"
Jun  8 17:42:31 01b1 systemd[1]: mnt-example.mount: Mount process finished, but there is no mount.
Jun  8 17:42:31 01b1 systemd[1]: mnt-example.mount: Failed with result 'protocol'.
Jun  8 17:42:31 01b1 systemd[1]: Failed to mount /mnt/example.
Jun  8 17:42:32 01b1 mount[1128]: mount to NFS server 'xx:xx:xx:xx:xx:xx:xx:xx' failed: Connection refused, retrying
Jun  8 17:42:34 01b1 mount[1128]: mount to NFS server 'xx:xx:xx:xx:xx:xx:xx:xx' failed: Connection refused, retrying

But when it fails to start the background service on reboot it shows this

Jun  8 17:49:05 01b1 systemd[1]: mnt-example.mount: Directory /mnt/example to mount over is not empty, mounting anyway.
Jun  8 17:49:05 01b1 systemd[1]: Mounting /mnt/example...
Jun  8 17:50:35 01b1 systemd[1]: mnt-example.mount: Mounting timed out. Terminating.
Jun  8 17:50:35 01b1 systemd[1]: mnt-example.mount: Mount process exited, code=killed status=15
Jun  8 17:50:35 01b1 systemd[1]: mnt-example.mount: Failed with result 'timeout'.
Jun  8 17:50:35 01b1 systemd[1]: Failed to mount /mnt/example.

On reboots when the host port is open, but the nfs-server isn't running, the connection is refused, and the background service is started.

On reboots when the host is off or the port is blocked, the connection times out, and the background service is not started.

If host isn't available, client isn't rebooting, and i manually mount -a, the background service will always start even if the connection is refused or timed out.

Any idea why the background service fails to start only on rebooting and a timed out connection? Any way to fix it so the background will always start when host isn't available?

--- UPDATE ---

Ive been trying random settings to see if something would help. I found that adjusting the timeout/retry settings allowed the background service to startup at reboot on both timeout and refused connections.

/etc/nfsmount.conf

# Default timeo=600
# Default retrans=2
timeo=20
retrans=4

However i feel like this is a band-aid and not a solution. This doesn't make any sense to me unless its a RHEL bug. Im guessing some kind of race issue is happening on the longer timeout. I don't trust this because what if those race conditions change and it breaks again.

140

0 + 0

mount

automount

nfs4

NFS4 background client wont start on reboot and connection timeout

Post an answer