I'm playing with nftables and observe strange behaviour which I cannot explain.
I have three VMs, source
, router
and destination
. All run latest Oracle EL 8.5 and are configured via nft.
source
has single network interface enp0s8
with IP 10.111.111.1 in /24 subnet.
router
has two network interfaces: enp0s8
with IP 10.111.111.2 in /24 subnet, and enp0s9
with IP 10.100.100.2 in /24 subnet.
destination
has single network interface enp0s8
with IP 10.100.100.1 in /24 subnet.
My goal is to have destination
hidden from source
behind NAT, with IP 10.200.200.1. What I've done:
- Enabled IP routing on
router
.
- Blocked direct access from 10.111.111.0/24 to 10.100.100.0/24 on
router
.
- Added static route 10.200.200.0/24 via 10.111.111.2 (
router
) on source
.
- Configured NAT on
router
as follows:
chain prerouting {
type nat hook prerouting priority dstnat; policy accept;
iifname "enp0s8" ip daddr 10.200.200.1 dnat to 10.100.100.1
}
chain postrouting {
type nat hook postrouting priority srcnat; policy accept;
ip saddr 10.100.100.1 oifname "enp0s8" snat to 10.200.200.1
}
Everything works as expected, destination
is reachable from source
only as 10.200.200.1 and not as 10.100.100.1 (sure, I know it's bad to work under root, it's just experimental VMs):
[root@source ~]# ping 10.100.100.1
PING 10.100.100.1 (10.100.100.1) 56(84) bytes of data.
^C
--- 10.100.100.1 ping statistics ---
15 packets transmitted, 0 received, 100% packet loss, time 14320ms
[root@source ~]# ping 10.200.200.1
PING 10.200.200.1 (10.200.200.1) 56(84) bytes of data.
64 bytes from 10.200.200.1: icmp_seq=1 ttl=63 time=0.554 ms
64 bytes from 10.200.200.1: icmp_seq=2 ttl=63 time=1.80 ms
64 bytes from 10.200.200.1: icmp_seq=3 ttl=63 time=1.84 ms
^C
--- 10.200.200.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2043ms
rtt min/avg/max/mdev = 0.554/1.397/1.836/0.598 ms
But when I do traceroute, or send ping with TTL=1, reply has the IP 10.200.200.1, instead of the router
's IP 10.111.111.2:
[root@source ~]# traceroute 10.200.200.1
traceroute to 10.200.200.1 (10.200.200.1), 30 hops max, 60 byte packets
1 10.200.200.1 (10.200.200.1) 0.752 ms 0.679 ms 0.984 ms
2 10.200.200.1 (10.200.200.1) 1.181 ms 1.130 ms 1.070 ms
[root@source ~]# ping 10.200.200.1 -c 1 -t 1
PING 10.200.200.1 (10.200.200.1) 56(84) bytes of data.
From 10.200.200.1 icmp_seq=1 Time to live exceeded
--- 10.200.200.1 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
If I do the same for any other address in 10.200.200.0/24 subnet, the replies have correct IPs:
[root@source ~]# ping 10.200.200.2 -c 1 -t 1
PING 10.200.200.2 (10.200.200.2) 56(84) bytes of data.
From 10.111.111.2 icmp_seq=1 Time to live exceeded
--- 10.200.200.2 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
Could anyone please clarify why in the first case ICMP TTL exceeded reply has the IP of the final destination, and in the second case it has the IP of the router?