Score:0

NFSv4 TCP reconnect fails after network failure due to iptables not tracking connection

tm flag

I have a NFSv4 setup via TCP, /etc/fstab:

nfs-server:/share /mount nfs4    tcp,hard,intr,rw,port=2049      0 0

A few days ago we had a network failure and multiple clients are still stuck trying to reconnect. I managed to pin-point the issue to the client's iptables dropping the SYN/ACK packages coming back from nfs-server.
So on the client we have netstat --numeric-ports showing the SYN_SENT state:

tcp        0      1 nfs-client:983 nfs-server:2049 SYN_SENT

The client's journalctl shows:

kernel: [UFW BLOCK] IN=eno1.490 OUT= MAC=<> SRC=nfs-server DST=nfs-client LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=2049 DPT=983 WINDOW=65160 RES=0x00 ACK SYN URGP=0
kernel: nfs: server nfs-server not responding, timed out

I.e. iptables is dropping the SYN/ACK packages from the nfs-server. Following the iptables rules, it seems it is dropped as iptables is convinced that the connection is in an invalid state.

On another NFS client, I was able to immediately make the NFS connection reestablish successfully by adding a rule allowing all traffic from port nfs-server:2049 as first executed rule: iptables -A ufw-before-logging-input -j ACCEPT -p tcp -m tcp --sport 2049 -s nfs-server

The sudo tcpdump -vv -i eno1.490 tcp and port 2049 output seems like a valid connection initiation procedure that iptables should be able to track:

16:16:58.750800 IP (tos 0x0, ttl 64, id 47308, offset 0, flags [DF], proto TCP (6), length 60)
nfs-client.983 > nfs-server.nfs: Flags [S], cksum 0xce12 (incorrect -> 0xdc81), seq 2846088619, win 64240, options [mss 1460,sackOK,TS val 3034143399 ecr 0,nop,wscale 8], length 0
16:16:58.751113 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
nfs-server.nfs > nfs-client.983: Flags [S.], cksum 0x0070 (correct), seq 3589064398, ack 2846088620, win 65160, options [mss 1460,sackOK,TS val 1673910248 ecr 3034143399,nop,wscale 7], length 0
16:16:59.801792 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
nfs-server.nfs > nfs-client.983: Flags [S.], cksum 0xfc54 (correct), seq 3589064398, ack 2846088620, win 65160, options [mss 1460,sackOK,TS val 1673911299 ecr 3034143399,nop,wscale 7], length 0
16:17:00.766804 IP (tos 0x0, ttl 64, id 47309, offset 0, flags [DF], proto TCP (6), length 60)
nfs-client.983 > nfs-server.nfs: Flags [S], cksum 0xce12 (incorrect -> 0xd4a1), seq 2846088619, win 64240, options [mss 1460,sackOK,TS val 3034145415 ecr 0,nop,wscale 8], length 0
16:17:00.767112 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
nfs-server.nfs > nfs-client.983: Flags [S.], cksum 0xf88f (correct), seq 3589064398, ack 2846088620, win 65160, options [mss 1460,sackOK,TS val 1673912264 ecr 3034143399,nop,wscale 7], length 0
16:17:02.809769 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
nfs-server.nfs > nfs-client.983: Flags [S.], cksum 0xf094 (correct), seq 3589064398, ack 2846088620, win 65160, options [mss 1460,sackOK,TS val 1673914307 ecr 3034143399,nop,wscale 7], length 0
16:17:04.894811 IP (tos 0x0, ttl 64, id 47310, offset 0, flags [DF], proto TCP (6), length 60)
nfs-client.983 > nfs-server.nfs: Flags [S], cksum 0xce12 (incorrect -> 0xc481), seq 2846088619, win 64240, options [mss 1460,sackOK,TS val 3034149543 ecr 0,nop,wscale 8], length 0
16:17:04.895124 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
nfs-server.nfs > nfs-client.983: Flags [S.], cksum 0xe86f (correct), seq 3589064398, ack 2846088620, win 65160, options [mss 1460,sackOK,TS val 1673916392 ecr 3034143399,nop,wscale 7], length 0
16:17:08.953736 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
nfs-server.nfs > nfs-client.983: Flags [S.], cksum 0xd894 (correct), seq 3589064398, ack 2846088620, win 65160, options [mss 1460,sackOK,TS val 1673920451 ecr 3034143399,nop,wscale 7], length 0
16:17:13.086804 IP (tos 0x0, ttl 64, id 47311, offset 0, flags [DF], proto TCP (6), length 60)
nfs-client.983 > nfs-server.nfs: Flags [S], cksum 0xce12 (incorrect -> 0xa481), seq 2846088619, win 64240, options [mss 1460,sackOK,TS val 3034157735 ecr 0,nop,wscale 8], length 0
16:17:13.087165 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
nfs-server.nfs > nfs-client.983: Flags [S.], cksum 0xc86f (correct), seq 3589064398, ack 2846088620, win 65160, options [mss 1460,sackOK,TS val 1673924584 ecr 3034143399,nop,wscale 7], length 0
16:17:21.561776 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
nfs-server.nfs > nfs-client.983: Flags [S.], cksum 0xa754 (correct), seq 3589064398, ack 2846088620, win 65160, options [mss 1460,sackOK,TS val 1673933059 ecr 3034143399,nop,wscale 7], length 0
16:17:29.214807 IP (tos 0x0, ttl 64, id 47312, offset 0, flags [DF], proto TCP (6), length 60)
nfs-client.983 > nfs-server.nfs: Flags [S], cksum 0xce12 (incorrect -> 0x6581), seq 2846088619, win 64240, options [mss 1460,sackOK,TS val 3034173863 ecr 0,nop,wscale 8], length 0
16:17:29.215123 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
nfs-server.nfs > nfs-client.983: Flags [S.], cksum 0x896f (correct), seq 3589064398, ack 2846088620, win 65160, options [mss 1460,sackOK,TS val 1673940712 ecr 3034143399,nop,wscale 7], length 0
16:17:45.625765 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
nfs-server.nfs > nfs-client.983: Flags [S.], cksum 0x4954 (correct), seq 3589064398, ack 2846088620, win 65160, options [mss 1460,sackOK,TS val 1673957123 ecr 3034143399,nop,wscale 7], length 0
16:18:03.262820 IP (tos 0x0, ttl 64, id 47313, offset 0, flags [DF], proto TCP (6), length 60)
nfs-client.983 > nfs-server.nfs: Flags [S], cksum 0xce12 (incorrect -> 0xe080), seq 2846088619, win 64240, options [mss 1460,sackOK,TS val 3034207911 ecr 0,nop,wscale 8], length 0
16:18:03.263233 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
nfs-server.nfs > nfs-client.983: Flags [S.], cksum 0x046f (correct), seq 3589064398, ack 2846088620, win 65160, options [mss 1460,sackOK,TS val 1673974760 ecr 3034143399,nop,wscale 7], length 0
16:18:27.839309 IP (tos 0x0, ttl 64, id 27773, offset 0, flags [DF], proto TCP (6), length 60)
nfs-client.983 > nfs-server.nfs: Flags [S], cksum 0xce12 (incorrect -> 0xc817), seq 4254089766, win 64240, options [mss 1460,sackOK,TS val 3034232488 ecr 0,nop,wscale 8], length 0
16:18:27.839634 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 52)
nfs-server.nfs > nfs-client.983: Flags [.], cksum 0xcfc5 (correct), seq 1, ack 1, win 509, options [nop,nop,TS val 1673999337 ecr 3034143399], length 0
16:18:27.839676 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
nfs-client.983 > nfs-server.nfs: Flags [R], cksum 0x4ed4 (correct), seq 2846088620, win 0, length 0

The iptables-save dump from the still hung machine is as follows. The journalctl log entries come from ufw-logging-deny, and given that the destination IP address is bound to an interface, I don't think that -A ufw-not-local -m limit --limit 3/min --limit-burst 10 -j ufw-logging-deny would drop the traffic, but instead, I believe the traffic is dropped by -A ufw-before-input -m conntrack --ctstate INVALID -j DROP.

# Generated by iptables-save v1.8.4 on Fri Dec  9 16:24:38 2022
*nat
:PREROUTING ACCEPT [1193931:94102423]
:INPUT ACCEPT [1119191:82192319]
:OUTPUT ACCEPT [1891891:163749867]
:POSTROUTING ACCEPT [1891909:163751857]
:DOCKER - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A DOCKER -i docker0 -j RETURN
COMMIT
# Completed on Fri Dec  9 16:24:38 2022
# Generated by iptables-save v1.8.4 on Fri Dec  9 16:24:38 2022
*filter
:INPUT DROP [810:61109]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [873:34920]
:DOCKER - [0:0]
:DOCKER-ISOLATION-STAGE-1 - [0:0]
:DOCKER-ISOLATION-STAGE-2 - [0:0]
:DOCKER-USER - [0:0]
:ufw-after-forward - [0:0]
:ufw-after-input - [0:0]
:ufw-after-logging-forward - [0:0]
:ufw-after-logging-input - [0:0]
:ufw-after-logging-output - [0:0]
:ufw-after-output - [0:0]
:ufw-before-forward - [0:0]
:ufw-before-input - [0:0]
:ufw-before-logging-forward - [0:0]
:ufw-before-logging-input - [0:0]
:ufw-before-logging-output - [0:0]
:ufw-before-output - [0:0]
:ufw-logging-allow - [0:0]
:ufw-logging-deny - [0:0]
:ufw-not-local - [0:0]
:ufw-reject-forward - [0:0]
:ufw-reject-input - [0:0]
:ufw-reject-output - [0:0]
:ufw-skip-to-policy-forward - [0:0]
:ufw-skip-to-policy-input - [0:0]
:ufw-skip-to-policy-output - [0:0]
:ufw-track-forward - [0:0]
:ufw-track-input - [0:0]
:ufw-track-output - [0:0]
:ufw-user-forward - [0:0]
:ufw-user-input - [0:0]
:ufw-user-limit - [0:0]
:ufw-user-limit-accept - [0:0]
:ufw-user-logging-forward - [0:0]
:ufw-user-logging-input - [0:0]
:ufw-user-logging-output - [0:0]
:ufw-user-output - [0:0]
-A INPUT -j ufw-before-logging-input
-A INPUT -j ufw-before-input
-A INPUT -j ufw-after-input
-A INPUT -j ufw-after-logging-input
-A INPUT -j ufw-reject-input
-A INPUT -j ufw-track-input
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A FORWARD -j ufw-before-logging-forward
-A FORWARD -j ufw-before-forward
-A FORWARD -j ufw-after-forward
-A FORWARD -j ufw-after-logging-forward
-A FORWARD -j ufw-reject-forward
-A FORWARD -j ufw-track-forward
-A OUTPUT -j ufw-before-logging-output
-A OUTPUT -j ufw-before-output
-A OUTPUT -j ufw-after-output
-A OUTPUT -j ufw-after-logging-output
-A OUTPUT -j ufw-reject-output
-A OUTPUT -j ufw-track-output
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN
-A ufw-after-input -p udp -m udp --dport 137 -j ufw-skip-to-policy-input
-A ufw-after-input -p udp -m udp --dport 138 -j ufw-skip-to-policy-input
-A ufw-after-input -p tcp -m tcp --dport 139 -j ufw-skip-to-policy-input
-A ufw-after-input -p tcp -m tcp --dport 445 -j ufw-skip-to-policy-input
-A ufw-after-input -p udp -m udp --dport 67 -j ufw-skip-to-policy-input
-A ufw-after-input -p udp -m udp --dport 68 -j ufw-skip-to-policy-input
-A ufw-after-input -m addrtype --dst-type BROADCAST -j ufw-skip-to-policy-input
-A ufw-after-logging-forward -m limit --limit 3/min --limit-burst 10 -j LOG --log-prefix "[UFW BLOCK] "
-A ufw-after-logging-input -m limit --limit 3/min --limit-burst 10 -j LOG --log-prefix "[UFW BLOCK] "
-A ufw-before-forward -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A ufw-before-forward -p icmp -m icmp --icmp-type 3 -j ACCEPT
-A ufw-before-forward -p icmp -m icmp --icmp-type 11 -j ACCEPT
-A ufw-before-forward -p icmp -m icmp --icmp-type 12 -j ACCEPT
-A ufw-before-forward -p icmp -m icmp --icmp-type 8 -j ACCEPT
-A ufw-before-forward -j ufw-user-forward
-A ufw-before-input -i lo -j ACCEPT
-A ufw-before-input -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A ufw-before-input -m conntrack --ctstate INVALID -j ufw-logging-deny
-A ufw-before-input -m conntrack --ctstate INVALID -j DROP
-A ufw-before-input -p icmp -m icmp --icmp-type 3 -j ACCEPT
-A ufw-before-input -p icmp -m icmp --icmp-type 11 -j ACCEPT
-A ufw-before-input -p icmp -m icmp --icmp-type 12 -j ACCEPT
-A ufw-before-input -p icmp -m icmp --icmp-type 8 -j ACCEPT
-A ufw-before-input -p udp -m udp --sport 67 --dport 68 -j ACCEPT
-A ufw-before-input -j ufw-not-local
-A ufw-before-input -d 224.0.0.251/32 -p udp -m udp --dport 5353 -j ACCEPT
-A ufw-before-input -d 239.255.255.250/32 -p udp -m udp --dport 1900 -j ACCEPT
-A ufw-before-input -j ufw-user-input
-A ufw-before-output -o lo -j ACCEPT
-A ufw-before-output -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A ufw-before-output -j ufw-user-output
-A ufw-logging-allow -m limit --limit 3/min --limit-burst 10 -j LOG --log-prefix "[UFW ALLOW] "
-A ufw-logging-deny -m conntrack --ctstate INVALID -m limit --limit 3/min --limit-burst 10 -j RETURN
-A ufw-logging-deny -m limit --limit 3/min --limit-burst 10 -j LOG --log-prefix "[UFW BLOCK] "
-A ufw-not-local -m addrtype --dst-type LOCAL -j RETURN
-A ufw-not-local -m addrtype --dst-type MULTICAST -j RETURN
-A ufw-not-local -m addrtype --dst-type BROADCAST -j RETURN
-A ufw-not-local -m limit --limit 3/min --limit-burst 10 -j ufw-logging-deny
-A ufw-not-local -j DROP
-A ufw-skip-to-policy-forward -j DROP
-A ufw-skip-to-policy-input -j DROP
-A ufw-skip-to-policy-output -j ACCEPT
-A ufw-track-output -p tcp -m conntrack --ctstate NEW -j ACCEPT
-A ufw-track-output -p udp -m conntrack --ctstate NEW -j ACCEPT
-A ufw-user-input -s xxx.0/24 -p tcp -m tcp --dport 22 -j ACCEPT
-A ufw-user-input -s xxx.139/32 -p tcp -m tcp --dport 10050 -j ACCEPT
-A ufw-user-input -s xxx.177/32 -p tcp -m tcp --dport 10050 -j ACCEPT
-A ufw-user-input -s xxx.144/32 -p tcp -m tcp --dport 9618 -j ACCEPT
-A ufw-user-input -s xxx.100/32 -p tcp -m tcp --dport 9618 -j ACCEPT
-A ufw-user-input -s xxx.101/32 -p tcp -m tcp --dport 9618 -j ACCEPT
-A ufw-user-input -s xxx.219/32 -p tcp -m tcp --dport 22 -j ACCEPT
-A ufw-user-input -s xxx.101/32 -p tcp -m tcp --dport 9618 -j ACCEPT
-A ufw-user-input -s xxx.129/32 -p tcp -m tcp --dport 983 -j ACCEPT
-A ufw-user-limit -m limit --limit 3/min -j LOG --log-prefix "[UFW LIMIT BLOCK] "
-A ufw-user-limit -j REJECT --reject-with icmp-port-unreachable
-A ufw-user-limit-accept -j ACCEPT
COMMIT
# Completed on Fri Dec  9 16:24:38 2022

I obviously could just go on all machines and fix the issue for now by restarting the machines / adding the iptables rule to allow incoming traffic from nfs-server:2049 but given our network's stability, I am convinced this problem would resurface earlier or later. So I'd love to receive any kind of hints or recommendations anyone might be able to give, on where the problem lies and how we can prevent this from resurfacing. Thanks!

Some additional version info:

$ iptables --version
iptables v1.8.4 (legacy)
$ uname -a
Linux nfs-client 5.4.0-132-generic #148-Ubuntu SMP Mon Oct 17 16:02:06 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release --all
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.5 LTS
Release:        20.04
Codename:       focal
$ sudo apt list --installed | grep nfs
libnfsidmap2/focal,now 0.25-5.1ubuntu1 amd64 [installed,automatic]
nfs-common/focal-updates,now 1:1.3.4-2.5ubuntu3.4 amd64 [installed]
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.