Score:0

iptables NETMAP not reliably adjusting source address of multicast UDP packets

us flag

In an embedded/IoT use case, I have a management host running Linux that needs to be able to talk to multiple networks that each use a common set of static IP addresses.

This mostly works fine, including for UDP multicast traffic, given:

  • network links for each embedded network (call them eth1, eth2, etc)
  • a separate Linux network namespace for each different embedded network (call them ns1, ns2, etc)
  • a peer link between each network namespace and the root namespace (call them peer1, peer2, etc from the network namespace side and veth1, veth2, etc from the root namespace side)
  • an iptables NETMAP rule in each namespace to map the conflicting static IP subnet (call it 192.168.0.x) to a non-conflicting set of static IP subnets (call them 192.168.1.x, 192.168.2.x, etc)
  • an smcrouted instance inside each network namespace to forward multicast group registrations
  • a separate IP address in a distinct (non-NAT) subnet for the root namespace side of the peer links to work around the topic of this question (call it 192.168.(x+100).1)

To try to visualise the traffic flows:

[|root namespace|::veth1] <-> [peer1::(namespace ns1)::eth1] <-> embedded network
[|              |::veth2] <-> [peer2::(namespace ns2)::eth2] <-> embedded network
... etc ...

ns1 example NETMAP rules for the static IP subnets:

sudo -n ip netns exec ns1 iptables -t nat -A PREROUTING -d 192.168.1.0/24 -i peer1 -j NETMAP --to 192.168.0.0/24
sudo -n ip netns exec ns1 iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -o peer1 -j NETMAP --to 192.168.1.0/24

ns1 example smcrouted config rules for a supported multicast group:

mgroup from eth1 group 239.255.60.60
mgroup from peer1 group 239.255.60.60
mroute from eth1 group 239.255.60.60 to peer1
mroute from peer1 source 192.168.101.1 group 239.255.60.60 to eth1

The actual topic of this question is that there's a weird glitch in the NETMAP source IP adjustment that I haven't been able to explain, only work around.

My expected behaviour:

  • UDP multicast subscriptions inside the network namespace will see the unmodified pre-NETMAP 192.168.0.x source addresses
  • UDP multicast subscriptions inside the root namespace will see the modified post-NETMAP 192.168.1.x source addresses

That isn't what happens, though. Instead, either subscribers in both namespaces see the pre-NETMAP 192.168.0.x source addresses, or else they see the 192.168.1.x post-NETMAP addresses.

The source filter on the mroute from peer1 rule in the smcroute configuration is there to prevent a multicast routing loop that otherwise starts when the server flips into the second set of behaviour.

I haven't so far been able to determine what causes the transition between the two states, only work around the problem at the application layer by adjusting based on the active network namespace or the source network interface when the source address information looks wrong.

The goal of asking the question is to help figure out which of the following applies:

  • this isn't expected to work, compensating at the application layer is the best that can be done (which seems unlikely given the use of network namespaces in Linux container environments)
  • there's something else that needs to be configured (or not configured) in the kernel, iptables, or smcroute to keep the misbehaviour from happening

(Note: this is a super-esoteric, very specific question, so I did wonder if Network Engineering might be more appropriate, but https://networkengineering.stackexchange.com/questions/64744/linux-local-multicast-egress-follows-forward-chain-when-smcroute-is-active made it clear that that is for working with commercial routers etc, not for Linux network namespace config. I'm less clear on the boundaries between Server Fault and the Unix & Linux stack exchange when it comes to configuring Linux servers, though)

Score:1
tr flag

Maintainer of SMCRoute here. This should definitely work. We use this exact approach, albeit with actual HW and not network namespaces, for various customers at work.

There is a very similar problem reported in the SMCRoute issue tracker, only difference from you is they don't use 1:1 NAT with netmap (yet).

I've whipped up a test case for this in preparation for the next release (v2.5). I run all tests locally and in GitHub Actions (Azure cloud) using:

cd test/
unshare -mrun ./multi.sh

The test has two separate routers (R1, and R2) in dedicated network namespaces, with a shared LAN segment between them (192.168.0.0/24). Behind each router is a private LAN (10.0.0.0/24), which is the same for both routers. An extra (dummy) interface eth1 is used to route multicast from to the shared LAN (eth0). The NETMAP rule uses the PREROUTING and POSTROUTING chain. Translating the R1 private LAN to 192.168.10.0/24 and the R2 private LAN to 192.168.20.0/24. As you can see below, the multicast routes installed in the kernel use the 1:1 mapped (global) addresses.

>> Starting emitters ...                                                           
R1[2811708]: New multicast data from 192.168.10.1 to group 225.1.2.3 on VIF 1
R1[2811708]: Add 192.168.10.1 -> 225.1.2.3 from VIF 1
R2[2811709]: New multicast data from 192.168.10.1 to group 225.1.2.3 on VIF 0
R2[2811709]: Add 192.168.10.1 -> 225.1.2.3 from VIF 0
R2[2811709]: New multicast data from 192.168.20.1 to group 225.1.2.3 on VIF 1
R2[2811709]: Add 192.168.20.1 -> 225.1.2.3 from VIF 1
R1[2811708]: New multicast data from 192.168.20.1 to group 225.1.2.3 on VIF 0
R1[2811708]: Add 192.168.20.1 -> 225.1.2.3 from VIF 0
>> R1 multicast routes and 1:1 NAT ...                                             
(192.168.10.1,225.1.2.3)         Iif: eth1       Oifs: eth0  State: resolved
(192.168.20.1,225.1.2.3)         Iif: eth0       Oifs: eth1  State: resolved
Chain PREROUTING (policy ACCEPT 5 packets, 244 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 NETMAP     all  --  any    any     anywhere             192.168.10.0/24      to:10.0.0.0/24

Chain INPUT (policy ACCEPT 1 packets, 84 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 4 packets, 248 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain POSTROUTING (policy ACCEPT 2 packets, 124 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    2   124 NETMAP     all  --  any    any     10.0.0.0/24          anywhere             to:192.168.10.0/24
>> R2 multicast routes and 1:1 NAT ...                                             
(192.168.10.1,225.1.2.3)         Iif: eth0       Oifs: eth1  State: resolved
(192.168.20.1,225.1.2.3)         Iif: eth1       Oifs: eth0  State: resolved
Chain PREROUTING (policy ACCEPT 4 packets, 204 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    1    84 NETMAP     all  --  any    any     anywhere             192.168.20.0/24      to:10.0.0.0/24

Chain INPUT (policy ACCEPT 2 packets, 168 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 3 packets, 164 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain POSTROUTING (policy ACCEPT 1 packets, 40 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    2   124 NETMAP     all  --  any    any     10.0.0.0/24          anywhere             to:192.168.20.0/24
>> Analyzing ...                                                                   
    1 0.000000000 0.000000000 192.168.10.1 → 225.1.2.3    ICMP 98 Echo (ping) request  id=0xe769, seq=1/256, ttl=2
    2 1.000105261 1.000105261 192.168.10.1 → 225.1.2.3    ICMP 98 Echo (ping) request  id=0xe769, seq=2/512, ttl=2
    3 1.000957268 0.000852007 192.168.20.1 → 225.1.2.3    ICMP 98 Echo (ping) request  id=0xe76b, seq=1/256, ttl=2
    4 2.024216212 1.023258944 192.168.10.1 → 225.1.2.3    ICMP 98 Echo (ping) request  id=0xe769, seq=3/768, ttl=2
    5 2.024216229 0.000000017 192.168.20.1 → 225.1.2.3    ICMP 98 Echo (ping) request  id=0xe76b, seq=2/512, ttl=2
    6 3.048426868 1.024210639 192.168.10.1 → 225.1.2.3    ICMP 98 Echo (ping) request  id=0xe769, seq=4/1024, ttl=2
    7 3.048426842 -0.000000026 192.168.20.1 → 225.1.2.3    ICMP 98 Echo (ping) request  id=0xe76b, seq=3/768, ttl=2
    8 4.072270331 1.023843489 192.168.10.1 → 225.1.2.3    ICMP 98 Echo (ping) request  id=0xe769, seq=5/1280, ttl=2
    9 4.072270458 0.000000127 192.168.20.1 → 225.1.2.3    ICMP 98 Echo (ping) request  id=0xe76b, seq=4/1024, ttl=2
   10 5.096430449 1.024159991 192.168.20.1 → 225.1.2.3    ICMP 98 Echo (ping) request  id=0xe76b, seq=5/1280, ttl=2
 => 10 for group ff04::114, expected >= 8

It's maybe a bit hard to read, you may have to consult the test case for details. Anyway, I get consistent results in the translation, which btw is the responsibility of Linux not SMCRoute, so you may have a kernel bug or something. May personal workstation has Linux Mint with kernel 5.11.0 and the backend servers for GitHub Actions run Ubuntu 20.04 LTS, kernel 5.8.0, both quite patched distro kernels, but maybe a baseline to start from?

us flag
Thank you! Knowing that it *should* be working at least lets me know I haven't completely messed up the config :) The misbehaviour was originally identified on Debian 9, which is pretty ancient at this point. I should have a system with a 5.14.x kernel to test on in the not-too-distant future, but I'll also scan the 4.9 LTS kernel changelogs to see if there are any plausibly related bug reports.
us flag
Browsing through https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?qt=grep&q=multicast I'm starting to wonder if something I omitted from the scenario description to try to make it easier to explain is actually critical to provoking the problem: the embedded networks aren't physically separate, they're different VLANs on a tagged VLAN trunk connection. That brings the kernel's bridge multicast handling into play in addition to everything else :(
us flag
So, tentative hypothesis based on this answer and browsing the last couple of years of kernel commit messages that mention "multicast": there may be an issue in older kernels that has been fixed by the various updates to the bridge and macvlan multicast handling in newer kernels. Next steps will be to see if the problem can be reproduced with a 4.9.283 kernel (latest LTS version for Debian 9), or an even newer 5.8.0+ kernel.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.