Score:0

Tracing a packet across a DNAT boundary and into OpenVSwitch

pt flag

I am trying to figure out where a connection is getting dropped in a complex SDN environment that involves a combination of nftables rules and an OpenVSwitch switch with complex flow rules.

I have a connection originating from 111.222.73.199 (not a real address), targeting (also not a real address) 222.333.61.241. The destination address is accessible through a VLAN interface on the target host:

# ip addr show bond0.2180
9: bond0.2180@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 10:7d:1a:9c:7c:1d brd ff:ff:ff:ff:ff:ff
    inet 222.333.61.23/24 scope global bond0.2180
       valid_lft forever preferred_lft forever

The default route on that system is not out the public address; the main routing table looks like:

default via 10.30.6.1 dev bond0 proto dhcp src 10.30.6.23 metric 300
10.30.6.0/23 dev bond0 proto kernel scope link src 10.30.6.23 metric 300
10.30.10.0/23 dev bond0.2173 proto kernel scope link src 10.30.10.23 metric 402
10.88.0.0/16 dev cni-podman0 proto kernel scope link src 10.88.0.1 linkdown
10.128.0.0/14 dev tun0 scope link
10.255.116.0/23 via 10.30.10.1 dev bond0.2173 proto dhcp src 10.30.10.23 metric 402
172.30.0.0/16 dev tun0
222.333.61.0/24 dev bond0.2180 proto kernel scope link src 222.333.61.23

We have some policy based rules in place to handle routing for traffic over the public interface:

# ip rule show
0:      from all lookup local
32764:  from 222.333.61.0/24 lookup main suppress_prefixlength 0
32765:  from 222.333.61.0/24 lookup 200
32766:  from all lookup main
32767:  from all lookup default

Where routing table 200 has:

default via 222.333.61.1 dev bond0.2180

With nftrace enabled, we can see that the inbound packet enter the PREROUTING chain in the nat table and gets as far as a dnat rule (this all looks fine):

trace id 7a66a648 ip nat PREROUTING packet: iif "bond0.2180" ether saddr 00:09:0f:09:00:22 ether daddr 10:7d:1a:9c:7c:1d ip saddr 111.222.73.199 ip daddr 222.333.61.241 ip dscp af21 ip ecn not-ect ip ttl 49 ip id 8129 ip length 60 tcp sport 47392 tcp dport 80 tcp flags == syn tcp window 64240
[...]
trace id 7a66a648 ip nat KUBE-SEP-CLHTNA52WCATND65 rule meta l4proto tcp   counter packets 0 bytes 0 dnat to 10.129.4.95:9991 (verdict accept)

Because we entered this rule through the PREROUTING chain, the dnat should result in a route lookup, which gets us:

# ip route get 10.129.4.95
10.129.4.95 dev tun0 src 10.131.2.1 uid 0
    cache

Where tun0 is an OpenVSwitch interface:

# ip -d addr show tun0
14: tun0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 22:10:ac:4b:ca:3c brd ff:ff:ff:ff:ff:ff promiscuity 1 minmtu 68 maxmtu 65535
    openvswitch numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
    inet 10.131.2.1/23 brd 10.131.3.255 scope global tun0
       valid_lft forever preferred_lft forever
    inet6 fe80::2010:acff:fe4b:ca3c/64 scope link
       valid_lft forever preferred_lft forever

Attached to the OVS bridge br0:

# ovs-vsctl show
02f8a53c-c970-419f-9c42-0b0be382638f
    Bridge br0
        fail_mode: secure
        [...]
        Port vxlan0
            Interface vxlan0
                type: vxlan
                options: {dst_port="4789", key=flow, remote_ip=flow}
        [...]
        Port br0
            Interface br0
                type: internal
        [...]
        Port tun0
            Interface tun0
                type: internal
        [...]
    ovs_version: "2.17.3"

I believe that at the point the packet is accepted by the dnat rule, we have:

  • source address: 111.222.73.199:47392
  • destination address: 10.129.4.95:9991

If we plug these values into ovs-appctl ofproto/trace, we get the following:

# ovs-appctl ofproto/trace br0 in_port=tun0,tcp,nw_src=111.222.73.199,nw_dst=10.129.4.95,tcp_src=47392,tcp_dst=9991
Flow: tcp,in_port=2,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=111.222.73.199,nw_dst=10.129.4.95,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=47392,tp_dst=9991,tcp_flags=0

bridge("br0")
-------------
 0. ct_state=-trk,ip, priority 1000
    ct(table=0)
    drop
     -> A clone of the packet is forked to recirculate. The forked pipeline will be resumed at table 0.
     -> Sets the packet to an untracked state, and clears all the conntrack fields.

Final flow: unchanged
Megaflow: recirc_id=0,ct_state=-trk,eth,ip,in_port=2,nw_frag=no
Datapath actions: ct,recirc(0x428ac)

===============================================================================
recirc(0x428ac) - resume conntrack with default ct_state=trk|new (use --ct-next to customize)
===============================================================================

Flow: recirc_id=0x428ac,ct_state=new|trk,eth,tcp,in_port=2,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=111.222.73.199,nw_dst=10.129.4.95,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=47392,tp_dst=9991,tcp_flags=0

bridge("br0")
-------------
    thaw
        Resuming from table 0
 0. ip,in_port=2, priority 200
    goto_table:30
30. priority 0
    goto_table:31
31. ip,nw_dst=10.128.0.0/14, priority 100
    goto_table:90
90. ip,nw_dst=10.129.4.0/23, priority 100, cookie 0x1173adfa
    move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31]
     -> NXM_NX_TUN_ID[0..31] is now 0
    set_field:10.30.6.19->tun_dst
    output:1
     -> output to kernel tunnel

Final flow: recirc_id=0x428ac,ct_state=new|trk,eth,tcp,tun_src=0.0.0.0,tun_dst=10.30.6.19,tun_ipv6_src=::,tun_ipv6_dst=::,tun_gbp_id=0,tun_gbp_flags=0,tun_tos=0,tun_ttl=0,tun_erspan_ver=0,gtpu_flags=0,gtpu_msgtype=0,tun_flags=0,in_port=2,vlan_tci=0x0000,dl_src=00:00:00:00:00:00,dl_dst=00:00:00:00:00:00,nw_src=111.222.73.199,nw_dst=10.129.4.95,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=47392,tp_dst=9991,tcp_flags=0
Megaflow: recirc_id=0x428ac,ct_state=-rpl+trk,eth,ip,tun_id=0/0xffffffff,tun_dst=0.0.0.0,in_port=2,nw_src=64.0.0.0/2,nw_dst=10.129.4.0/23,nw_ecn=0,nw_frag=no
Datapath actions: set(tunnel(tun_id=0x0,dst=10.30.6.19,ttl=64,tp_dst=4789,flags(df|key))),2

According to the above, the packet should get emitted over vxlan tunnel 0 to host 10.30.6.19...but we never see that traffic on the network.

Additionally, if I enable debug logging for the OVS dpif facility, like this:

ovs-appctl vlog/set file:dpif:dbg

I never see either the source address (111.222.73.199) or the destination address (10.129.4.95) or the destination port (9991) in the logs.

I am looking for any suggestions to help figure out where this connection is going (or even to verify that it is entering OVS as I expect).

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.