Score:1

Allowing a route to/from network when there are multiple networks

kr flag

I have tough one here but can't seem to figure out the right routing.

I have a server (serverA) that is on two separate networks: 192.168.200.x/24 & 192.168.117.0/26. This server has a hostname (serverA.example.com) on 192.168.117.70 & a separate connection via 192.168.200.45(this is required to mount storage/etc on this network).

I do not have a way to do split-DNS internally so I need to route everything to this IP/Hostname for sanity of the users.

Currently everything that is not on the 192.168.200.x network is able to hit serverA.example.com just fine. But all systems on the 192.168.200.x are not able to route to the 192.168.117.70 IP address, they can get to the 192.168.200.45 just fine though.

Here is the routing I have setup on serverA:

0.0.0.0           192.168.117.65    0.0.0.0         UG    0      0        0 onboard-10Gb-1
192.168.117.64    0.0.0.0         255.255.255.192 U     0      0        0 onboard-10Gb-1
192.168.200.0     0.0.0.0         255.255.255.0   U     0      0        0 bond0

I don't have much experience with routing and such, so I know i'm missing something here.

A.B avatar
cl flag
A.B
It looks like a routing problem caused by multi-homing. There's one important thing to know in advance with this: does the server provide UDP services (rather than TCP services)? UDP services have additional issues on multi-homing systems compared to TCP, even once routing is fixed.
Derek Edwards avatar
kr flag
Yes, I know that this is multi-homing. I'm hoping to get help with how to configure routing to make this work...
Derek Edwards avatar
kr flag
Deleted the other one. This is utilizing both TCP/UDP, yes. It is mostly utilizing TCP though...would be worth a test though if I can get TCP working.
Score:1
cl flag
A.B

TL;DR

ip route add 192.168.117.64/26 dev onboard-10Gb-1 table 1000 
ip route add default via 192.168.117.65 dev onboard-10Gb-1 table 1000
ip rule add from 192.168.117.70 lookup 1000

For more details, and for proper handling of UDP services, read below.


Note: on Linux, route and ifconfig are obsolete commands. They are not suitable for advanced routing such as policy routing. One should systematically use iproute2 replacements instead: ip route, ip link and ip address (and all other related commands from the iproute2 suite).


Policy routing

The server being multi-homed will by default reply directly to the attached LAN 192.168.200.0/24 (examples below will use an hypothetical system at 192.168.200.101) when queried from there rather than following the path the query came from. This is an asymmetric flow, which can fail for various reasons, among them:

  • the server itself when configured with rp_filter=1 will drop such asymmetric traffic following Strict Reverse Path Forwarding rules.
  • A firewall in the path not seeing replies might be configured to drop traffic (e.g. when tracking the TCP window)
  • if NAT happens somewhere: the direct reply isn't un-NAT-ed and is dropped by the client
  • the server when hosting an UDP service that is not multi-homed aware will choose the wrong reply address, making the client drop such reply.

Even if without the presence of the 3 first cases TCP would probably work, UDP is even more difficult with the last case and policy routing alone is often not always enough for UDP (see Caveat below), but still required.

This requires policy routing so that each address is considered separately when doing a routing decision to reply back.

On Linux this is implemented by using routing rules with selectors to use alternate routing tables that will know only the needed path for the selected goal: only a partial copy of all the possible routes. The selector chosen is usually a criteria depending on something else than the destination (which is already provided with standard route entries). Most of the time it's the source address but it depends on the goal.

Here the goal is to have a routing table that doesn't know specifically about 192.168.200.0/24 so it gets routed using the default route over onboard-10Gb-1 instead of the LAN route on bond0 when a reply is made from 192.168.117.70.

Duplicate only the needed routes in routing table 1000 (value 1000 chosen arbitrarily):

ip route add 192.168.117.64/26 dev onboard-10Gb-1 table 1000 
ip route add default via 192.168.117.65 dev onboard-10Gb-1 table 1000

When the source address is from 192.168.117.70, meaning it's the server's address, look up the alternate routing table 1000 (before looking up the main routing table: if the lookup succeeds in giving a route, the main table won't be used):

ip rule add from 192.168.117.70 lookup 1000

An equivalent table and rule for the other LAN could be added, but it's already handled by the main routing table. Incoming traffic is already handled first by the local routing table, so there's nothing more to do with this setup:

# ip rule
0:      from all lookup local
32765:  from 192.168.117.70 lookup 1000
32766:  from all lookup main
32767:  from all lookup default

Then on server, depending on the way services are used:

  • TCP service binding or not binding to a specific address always works

    Any accept(2)-ed socket has its source (local) address set to the destination address the query used, so emitted packets will match the routing rule selector when needed: nothing more to do for this case.

  • TCP client or UDP client can bind the source address when doing a query/connection, to change the path:

    TCP and UDP examples:

    ssh -b 192.168.117.70 [email protected]
    
    traceroute -n -s 192.168.117.70 192.168.200.101
    

The intended alternate routing table 1000 will be selected according to the source address chosen:

# ip route get from 192.168.117.70 to 192.168.200.101
192.168.200.101 from 192.168.117.70 via 192.168.117.65 dev onboard-10Gb-1 table 1000 uid 0 
    cache 

Caveat: UDP service

The way UDP and the BSD socket API works, by default an UDP socket that doesn't bind to an address (i.e. uses 0.0.0.0 aka INADDR_ANY), when it's used to reply to an UDP message that was received, doesn't have all the context of the query, in particular it won't have the local address of the server the query was sent to, contrary to TCP: it will just use the socket's 0.0.0.0 address.

So when replying to a query from 192.168.200.101 to 192.168.117.70, it will present to the routing stack a source of 0.0.0.0 to defer to the routing stack the selection of the actual source address. This won't match the routing rule selector for 192.168.117.70 in place, and the reply will use the main routing table, choosing the wrong source reply address: 192.168.200.45. When the client receives such reply (directly from the same LAN), it won't recognize it as a reply to its query: it's from an other address, and will ignore it.

There are two ways to have the UDP server application handle this correctly:

  • bind(2) to a specific address.

    Any reply using this socket will use the address it was bound to. Thus selecting the routing rule and the intended routing behavior. If the server has to provide service on all of its addresses, it should bind the same way multiple times: once per address.

    There are settings in most daemons to do just that. For example by default, ISC's DNS server bind 9 binds to all addresses belonging to the server and follows dynamic changes. When a query arrives to such socket, the address of the bound socket is the address of the query's destination. It will be reused as source address, thus selecting the correct routing rule.

  • else use the socket option IP_PKTINFO

    This enables the reception of ancillary data by the application, allowing it to know on what address and on what interface the packet was received and gives all information for a correct reply. This requires specific application support including use of additional functions such as cmsg(3).

    For example that's the mode of operation of NLnet Labs's DNS server unbound's when using the server option interface-automatic: yes.

If the UDP server application can't be changed at all, there are only bad choices left.

Using Netfilter's conntrack and iptables won't work: one could change the destination in the output hook, but it's the source that has to be changed. One could change the source in the postrouting hook, but as the name implies, it's after routing: too late for the alternate route to be chosen. Even if it was allowed, as it's about NAT-ing a reply to a now existing flow, Netfilter wouldn't cope correctly with it and would change the source port used for reply to avoid a supposed-only clash instead of reusing the same flow.

In such case one can use additional selectors that will choose an alternate source selection depending on the service and the destination (which in this case is the reply), to force the other choice: a source of 192.168.117.70 instead of 192.168.200.45 (and now a direct query to from 192.168.200.101 to 192.168.200.45 would fail for similar reasons instead).

For example if server were to host a simple UDP service on port 5555 that can't be configured to bind to 192.168.117.70 or use IP_PKTINFO and that should never be used directly LAN-to-LAN on 192.168.200.0/24, one can nudge the correct route selection with (this requires kernel >= 4.17):

ip rule add from 0.0.0.0/32 ipproto udp sport 5555 to 192.168.200.0/24 lookup 1000

Here 0.0.0.0/32 is used in its INADDR_ANY role. The routing stack will replace it with an adequate source in the end, but this time chosen from using routing table 1000.

Before:

# ip route get ipproto udp sport 5555  to 192.168.200.101
192.168.200.101 dev bond0 src 192.168.200.45 uid 0 
    cache 

After:

# ip route get ipproto udp sport 5555 to 192.168.200.101
192.168.200.101 via 192.168.117.65 dev onboard-10Gb-1 table 1000 src 192.168.117.70 uid 0 
    cache 

Won't affect other cases (eg: source port 5556):

# ip route get ipproto udp sport 5556 to 192.168.200.101
192.168.200.101 dev bond0 src 192.168.200.45 uid 0 
    cache 

nftables

Actually, an inferior solution also exists as an alternative to the 2nd ip rule added just above, using nftables instead to do static NAT wich doesn't depend on Netfilter's conntrack. Here's the ruleset to load with nft -f ...:

table t_statelessnat
delete table t_statelessnat

table ip t_statelessnat {
        chain c_snat {
                type route hook output priority raw; policy accept;
                ip daddr 192.168.200.0/24 ip saddr != 192.168.117.70 udp sport 5555 ip saddr set 192.168.117.70
        }
}

The type route hook will take care of rerouting the packet which will now traverse routing table 1000.

Better use the routing rule instead unless a complex filter that can't be used with ip rule is required.

Derek Edwards avatar
kr flag
Wow -- thank you for this extremely detailed answer. Can you point me to somewhere where I can learn ip route/link/address in this much detail? This seems it would be extremely useful in the future for myself.
A.B avatar
cl flag
A.B
I'm not sure there's specific documentation about this UDP problem: it's just a "known problem" with UDP sockets that one learns along the use of IP_PKTINFO (IP_RECVDSTADDR on FreeBSD...). About routing, there's some old but still mostly relevant doc there: https://lartc.org/howto/index.html
Derek Edwards avatar
kr flag
Attempting to add the ip rule add for UDP because the system is hosting a webserver which uses UDP for downloading content it seems. ip rule add from 0.0.0.0/32 ipproto udp sport 8080 to 192.168.200.0/24 lookup 1000 Error: argument "ipproto" is wrong: Failed to parse rule type ip route get ipproto udp sport 8080 to 192.168.200.200 Error: any valid prefix is expected rather than "ipproto".
A.B avatar
cl flag
A.B
I did state this requires kernel >= 4.17 (and equivalent iproute2 packaged suite), I also wrote it's a last resort. Just change the webserver setttings to bind to the correct address: that's the first described method.
Derek Edwards avatar
kr flag
It was already working on its own, nevermind! Got it!
Derek Edwards avatar
kr flag
Can I just add these commands to a `route` file? Or how do I get these routes to persist?
A.B avatar
cl flag
A.B
The method to integrate this depends on the distribution. On Debian with ifupdown it can be added as `up` etc. commands. On RHEL6/7 it can be added in adequate files in /etc/sysconfig/network-scripts/ . For NetworkManager, systemd-networkd or netplan I have no idea.
Derek Edwards avatar
kr flag
Thank you, got it with rule-* with RHEL. Thanks for all the help!
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.