I am trying to figure out where a connection is getting dropped in a complex SDN environment that involves a combination of nftables
rules and an OpenVSwitch switch with complex flow rules.
I have a connection originating from 111.222.73.199 (not a real address), targeting (also not a real address) 222.333.61.241. The destination address is accessible through a VLAN interface on the target host:
# ip addr show bond0.2180
9: bond0.2180@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 10:7d:1a:9c:7c:1d brd ff:ff:ff:ff:ff:ff
inet 222.333.61.23/24 scope global bond0.2180
valid_lft forever preferred_lft forever
The default route on that system is not out the public address; the main routing table looks like:
default via 10.30.6.1 dev bond0 proto dhcp src 10.30.6.23 metric 300
10.30.6.0/23 dev bond0 proto kernel scope link src 10.30.6.23 metric 300
10.30.10.0/23 dev bond0.2173 proto kernel scope link src 10.30.10.23 metric 402
10.88.0.0/16 dev cni-podman0 proto kernel scope link src 10.88.0.1 linkdown
10.128.0.0/14 dev tun0 scope link
10.255.116.0/23 via 10.30.10.1 dev bond0.2173 proto dhcp src 10.30.10.23 metric 402
172.30.0.0/16 dev tun0
222.333.61.0/24 dev bond0.2180 proto kernel scope link src 222.333.61.23
We have some policy based rules in place to handle routing for traffic over the public interface:
# ip rule show
0: from all lookup local
32764: from 222.333.61.0/24 lookup main suppress_prefixlength 0
32765: from 222.333.61.0/24 lookup 200
32766: from all lookup main
32767: from all lookup default
Where routing table 200 has:
default via 222.333.61.1 dev bond0.2180
With nftrace
enabled, we can see that the inbound packet enter the PREROUTING
chain in the nat
table and gets as far as a dnat
rule (this all looks fine):
trace id 7a66a648 ip nat PREROUTING packet: iif "bond0.2180" ether saddr 00:09:0f:09:00:22 ether daddr 10:7d:1a:9c:7c:1d ip saddr 111.222.73.199 ip daddr 222.333.61.241 ip dscp af21 ip ecn not-ect ip ttl 49 ip id 8129 ip length 60 tcp sport 47392 tcp dport 80 tcp flags == syn tcp window 64240
[...]
trace id 7a66a648 ip nat KUBE-SEP-CLHTNA52WCATND65 rule meta l4proto tcp counter packets 0 bytes 0 dnat to 10.129.4.95:9991 (verdict accept)
Because we entered this rule through the PREROUTING
chain, the dnat
should result in a route lookup, which gets us:
# ip route get 10.129.4.95
10.129.4.95 dev tun0 src 10.131.2.1 uid 0
cache
Where tun0
is an OpenVSwitch interface:
# ip -d addr show tun0
14: tun0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 22:10:ac:4b:ca:3c brd ff:ff:ff:ff:ff:ff promiscuity 1 minmtu 68 maxmtu 65535
openvswitch numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
inet 10.131.2.1/23 brd 10.131.3.255 scope global tun0
valid_lft forever preferred_lft forever
inet6 fe80::2010:acff:fe4b:ca3c/64 scope link
valid_lft forever preferred_lft forever
Attached to the OVS bridge br0
:
# ovs-vsctl show
02f8a53c-c970-419f-9c42-0b0be382638f
Bridge br0
fail_mode: secure
[...]
Port vxlan0
Interface vxlan0
type: vxlan
options: {dst_port="4789", key=flow, remote_ip=flow}
[...]
Port br0
Interface br0
type: internal
[...]
Port tun0
Interface tun0
type: internal
[...]
ovs_version: "2.17.3"
I believe that at the point the packet is accepted by the dnat
rule, we have:
- source address: 111.222.73.199:47392
- destination address: 10.129.4.95:9991
If we plug these values into ovs-appctl ofproto/trace
, we get the following:
# ovs-appctl ofproto/trace br0 in_port=tun0,tcp,nw_src=111.222.73.199,nw_dst=10.129.4.95,tcp_src=47392,tcp_dst=9991
Flow: tcp,in_port=2,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=111.222.73.199,nw_dst=10.129.4.95,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=47392,tp_dst=9991,tcp_flags=0
bridge("br0")
-------------
0. ct_state=-trk,ip, priority 1000
ct(table=0)
drop
-> A clone of the packet is forked to recirculate. The forked pipeline will be resumed at table 0.
-> Sets the packet to an untracked state, and clears all the conntrack fields.
Final flow: unchanged
Megaflow: recirc_id=0,ct_state=-trk,eth,ip,in_port=2,nw_frag=no
Datapath actions: ct,recirc(0x428ac)
===============================================================================
recirc(0x428ac) - resume conntrack with default ct_state=trk|new (use --ct-next to customize)
===============================================================================
Flow: recirc_id=0x428ac,ct_state=new|trk,eth,tcp,in_port=2,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=111.222.73.199,nw_dst=10.129.4.95,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=47392,tp_dst=9991,tcp_flags=0
bridge("br0")
-------------
thaw
Resuming from table 0
0. ip,in_port=2, priority 200
goto_table:30
30. priority 0
goto_table:31
31. ip,nw_dst=10.128.0.0/14, priority 100
goto_table:90
90. ip,nw_dst=10.129.4.0/23, priority 100, cookie 0x1173adfa
move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31]
-> NXM_NX_TUN_ID[0..31] is now 0
set_field:10.30.6.19->tun_dst
output:1
-> output to kernel tunnel
Final flow: recirc_id=0x428ac,ct_state=new|trk,eth,tcp,tun_src=0.0.0.0,tun_dst=10.30.6.19,tun_ipv6_src=::,tun_ipv6_dst=::,tun_gbp_id=0,tun_gbp_flags=0,tun_tos=0,tun_ttl=0,tun_erspan_ver=0,gtpu_flags=0,gtpu_msgtype=0,tun_flags=0,in_port=2,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=111.222.73.199,nw_dst=10.129.4.95,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=47392,tp_dst=9991,tcp_flags=0
Megaflow: recirc_id=0x428ac,ct_state=-rpl+trk,eth,ip,tun_id=0/0xffffffff,tun_dst=0.0.0.0,in_port=2,nw_src=64.0.0.0/2,nw_dst=10.129.4.0/23,nw_ecn=0,nw_frag=no
Datapath actions: set(tunnel(tun_id=0x0,dst=10.30.6.19,ttl=64,tp_dst=4789,flags(df|key))),2
According to the above, the packet should get emitted over vxlan tunnel 0 to host 10.30.6.19...but we never see that traffic on the network.
Additionally, if I enable debug logging for the OVS dpif
facility, like this:
ovs-appctl vlog/set file:dpif:dbg
I never see either the source address (111.222.73.199
) or the destination address (10.129.4.95
) or the destination port (9991
) in the logs.
I am looking for any suggestions to help figure out where this connection is going (or even to verify that it is entering OVS as I expect).