I have a weird use case, where a pod running in Azure Kubernetes needs to route traffic from specific ports to specific targets through a dedicated VPN tunnel. But those targets are private IPs and therefore can have the same IP for different targets. The pod besides routing is also the OpenVPN Server where the targets connect to. An example:
Communications arriving at port 10 are routed to IP 10.0.0.4:80 through VPN IP 10.118.0.2
and at the same time we can have:
Communications arriving at port 20 are routed to IP 10.0.0.4:80 through VPN IP 10.118.0.3
Despite the target IP being the same, they are different machines. So in order to achieve this, i came up with this possible solution:
/sbin/iptables --table mangle --insert PREROUTING --destination "192.168.0.100" -i eth0 -p tcp --dport "10" --jump MARK --set-mark "10"
/sbin/iptables --table nat --insert PREROUTING --destination "192.168.0.100" -i eth0 -p tcp --dport "10" --jump DNAT --to-destination "10.0.0.4:80"
/sbin/ip rule add prio "10" from all fwmark "10" lookup "10"
/sbin/ip route add "10.0.0.4" via "10.118.0.2" table 10
This would allow to have both communications work at the same time with the traffic being routed to the correct machine. But what i see is that the packets are marked, in the mangle table. But then never reach the NAT tables. I found out that it is related to the rt_filter. More on that below. As it is right now, and it's working, is like this:
/sbin/iptables --table nat --insert PREROUTING --destination "192.168.0.100" -i eth0 -p tcp --dport "10" --jump DNAT --to-destination "10.0.0.4:80"
/sbin/ip route add "10.0.0.4" via "10.118.0.2"
However, if a second route is established, like in the first example, the command would look like this:
/sbin/iptables --table nat --insert PREROUTING --destination "192.168.0.100" -i eth0 -p tcp --dport "20" --jump DNAT --to-destination "10.0.0.4:80"
/sbin/ip route add "10.0.0.4" via "10.118.0.3"
This would create another route in the main Route Table for the same target. But then the user when accessing 192.168.0.100 could get routed to either the machine connected with 10.118.0.3 or 10.118.0.2.
Besides this rules, for all of them this one was always enabled in order to allow traffic to get back to the tap0 interface where communication to 10.118.0.X go:
iptables -t nat -A POSTROUTING -o tap0 -j MASQUERADE
Unfortunately i can't know the User source IP, otherwise it would be simple to solve. The source IP for any communication that arrives at this port will always be the same because the communication needs to go through another service which masks the real source IP.
I saw in other topics that in order to mark incoming packets in a container/pod i need to disable the rt_filter. However i cannot do that, it says it is a read-only file system and i don't know if it's even possible to change that in Azure Kubernetes Clusters.
Is there any other solution rather than marking packets? Or is something else missing regarding the packet marking?