Score:2

Why is Linux policy-based routing (PBR) not working for ping?

ao flag

First of all, it seems as if this question is about Linux, but it seems to me that it is about basic routing concepts.

I happen to have the following configuration:

enter image description here

What I am trying to do is to ensure symmetric routing on the server (CentOS 7), so that incoming and outgoing traffic from it takes the same path for any pair of nodes (using both network interfaces).

Suppose I set the static IP address 192.168.0.210/24 for eno1 and 192.168.1.210/24 for eno2 (eno1 and eno2 is the same as eth0 and eth1 in other Linux distributions).

Then I created 2 routing tables (one for each network interface) in /etc/iproute2/rt_tables:

...
101     net1
102     net2

Then I created routes in each routing table and policy routing rules to direct outbound traffic to the appropriate routing table, as follows:

$ ip route show table net1
default via 192.168.0.1 dev eno1 
192.168.0.0/24 dev eno1 scope link

$ ip route show table net2
default via 192.168.1.1 dev eno2 
192.168.1.0/24 dev eno2 scope link

$ ip rule show
0:      from all lookup local 
101:    from 192.168.0.0/24 lookup net1 
102:    from 192.168.1.0/24 lookup net2 
32766:  from all lookup main 
32767:  from all lookup default

These are the first tests I did (which worked as expected):

$ ip route get 192.168.100.100 from 192.168.0.210
192.168.100.100 from 192.168.0.210 via 192.168.0.1 dev eno1
    cache

$ ip route get 192.168.100.100 from 192.168.1.210
192.168.100.100 from 192.168.1.210 via 192.168.1.1 dev eno2
    cache

$ ip route get 192.168.100.100
192.168.100.100 via 192.168.1.1 dev eno2 src 192.168.1.210
    cache

Finally, using the tshark tool I started monitoring the network interfaces eno1 and eno2 and made requests through each, for example:

$ curl --interface eno1 https://google.com
$ curl --interface eno2 https://google.com
$ traceroute -i eno1 google.com
$ traceroute -i eno2 google.com
$ ping -I eno1 -c 2 google.com
$ ping -I eno2 -c 2 google.com

The first 4 commands worked as expected (incoming and outgoing traffic was properly captured by tshark on each network interface), but the ping commands did not. This is the output from tshark for the ping commands:

enter image description here

As you can see, the ping worked only for eno2. After trial and error, I realized that the ping only worked for the network interface that was associated with the generic default gateway:

$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.1.1     0.0.0.0         UG    0      0        0 eno2
169.254.0.0     0.0.0.0         255.255.0.0     U     1002   0        0 eno1
169.254.0.0     0.0.0.0         255.255.0.0     U     1003   0        0 eno2
192.168.0.0     0.0.0.0         255.255.255.0   U     0      0        0 eno1
192.168.1.0     0.0.0.0         255.255.255.0   U     0      0        0 eno2

From my understanding, the ping commands should have worked even without setting a generic default gateway, since the default gateways set in net1 and net2 should have been used, is this correct?

Why is this happening? Does it have to do with the way ping works? Why did the first 4 commands work?

Peter Zhabin avatar
cn flag
Does eno1 have any other IP addresses assigned to it? Can you show complete output of `ip a l`?
Score:3
cl flag
A.B

There's a difference between binding to an interface and binding to an IP address. While the 2 working cases do bind to an interface, they avoid the problem that ping encounters (to be explained later). Let's start with fixing ping. I reproduced OP's setup to help giving illustrations.

The route lookup when only binding to an interface is not:

ip route get  from 192.168.1.210

but:

# ip route get oif eno2 to 192.168.100.100
192.168.100.100 dev eno2 src 192.168.1.210 uid 0 
    cache 

The tables 101 and 102 are not involved here, since there is no local source address specified in the lookup. Moreover, there is no default route in the main routing table for 192.168.100.100. But as the interface was forced to eno2, such default route is automatically created... without gateway. The visible symptom is that there will be ARP requests emitted from 192.168.1.210 to 192.168.100.100 since the bogus route tells 192.168.100.100 is directly reachable.

Had OP also added the (usually useless) additional default route with higher metric, such as:

ip route add default via 192.168.1.1 dev eno2 metric 101

then:

# ip route get oif eno2 to 192.168.100.100
192.168.100.100 via 192.168.1.1 dev eno2 src 192.168.1.210 uid 0 
    cache 

Now, since there was already a matching default route through eno2 it is selected, with a correct gateway. ping would now work. Routing table 102 is still not involved.

Routing rule selector for bound interface

The actual correct way to have the route defined in table 102 to be used is to use the oif selector in ip rules:

oif NAME

select the outgoing device to match. The outgoing interface is only available for packets originating from local sockets that are bound to a device.

Let's use it (and delete the 2nd default route to show it's not needed anymore):

ip route delete default via 192.168.1.1 dev eno2 metric 101
ip rule add oif eno1 lookup 101
ip rule add oif eno2 lookup 102

The lookup now will match and become:

# ip route get oif eno2 to 192.168.100.100
192.168.100.100 via 192.168.1.1 dev eno2 table 102 src 192.168.1.210 uid 0 
    cache 

This time, as the selector matched, the correct routing table was used, with a default defined with a gateway.

That's what had to be done.

Note: ping also accepts binding to an IP address (ping -I 192.168.1.210 -c 2 google.com) or even both (ping -I eno2 -I 192.168.1.210 -c 2 google.com). These case would have worked without additional routing rules as explained.


Why did the two first cases work anyway?

(Remove the correction above and...)

As was seen in previous faulty route resolution:

# ip route get oif eno2 to 192.168.100.100
192.168.100.100 dev eno2 src 192.168.1.210 uid 0 
    cache 

the correct IP source address still gets selected. As soon as TCP has to emit a packet, its route lookup will be presented with the source address 192.168.1.210. This case does match the selector in rule pref 102:

# ip route get from 192.168.1.210 oif eno2 to 192.168.100.100
192.168.100.100 from 192.168.1.210 via 192.168.1.1 dev eno2 table 102 uid 0 
    cache 

Table 102 still got selected by rule pref 102. Once the adequate table is selected, no matter why it was selected, correct routing happens.

For UDP it's a bit more complicated, because it depends if the UDP client uses connect(2) which then behaves like previous TCP case: a source address will be used, or chooses to not use connect(2). For example this command would fail to be routed correctly without the missing routing rules, because it doesn't use connect(2) and the routing stack will be queried without source (ie: with INADDR_ANY = 0.0.0.0):

echo test | socat udp4-datagram:203.0.113.1:8888,so-bindtodevice=eno2 -

while this one would succeed because it uses connect(2):

echo test | socat          udp4:203.0.113.1:8888,so-bindtodevice=eno2 -

Of course both will work fine once the two missing routing rules are added.

One can check that traceroute does use connect by using strace:

...
socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP) = 3
setsockopt(3, SOL_SOCKET, SO_BINDTODEVICE, "eno2\0", 5) = 0
...
connect(3, {sa_family=AF_INET, sin_port=htons(33434), sin_addr=inet_addr("203.0.113.1")}, 28) = 0
...
Tedpac avatar
ao flag
Hello, A.B. Thank you very much for your reply. I have not been able to get the time out to test it, but since you were the only one to respond and the answer sounds like it is correct, I will give you the bounty. I will accept the answer when I can test it later. Congratulations on your 10k. :)
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.