Score:1

Route outgoing packets from VM to GRE tunnel from which income packets came from? (TCP flows)

ai flag

We have a network infrastructure with Anycast IPs on edge servers in multiple countries, those servers encapsulate traffic in GRE tunnels to endpoint nodes with virtual machines

Currently we route all TX to default route with one of our EDGEs, but assymetry and poor RTT sometimes is bad for us.

(example: outgoing TCP SYN packet going to default route (EDGE ams-1), incoming TCP SYN/ACK comes from EDGE de-1 in other country)

Problem is in routing outgoing traffic from virtual machines to edge servers from which incoming traffic (tcp flow) came from.

Solution is to make sort-of symmetric routing as described above, we tried fwmarks/connmarks with no luck (or we used it wrong?)

sysctl -w net.netfilter.nf_conntrack_tcp_loose=1

ip rule add fwmark 10001 table 10001 priority 45
ip route add default dev e-kiev1 table 10001

iptables -t mangle -A PREROUTING -p tcp -i vm+ -m set --match-set networks src -j CONNMARK --restore-mark

iptables -t mangle -A PREROUTING -i e-kiev1 -p tcp --tcp-flags SYN SYN -m set --match-set networks dst -j CONNMARK --set-mark 10001

(commands repeated for all edge servers)

We can pay for solution.

Score:0
cl flag
A.B

There are two issues: handling marks properly, and missing routes.

mark and connmark

The routing stack knows about the firewall mark (aka fwmark aka mark). It doesn't know anything about Netfilter's connmark. This has an effect on the first SYN packet.

The ruleset must take steps to move the value from the flow to the packet when needed. Here it does the copy from the flow's connmark to the packet's mark, but at the initial step when encountering a SYN packet, the 2nd rule creates directly a flow connmark without ever marking the packet itself with a mark: the SYN packet will not be evaluated differently by the routing stack right after since it doesn't have a mark, while all other packets of this flow will be since they then inherit the connmark as mark.

iptables rules should be changed like this (further optimization and proper simplification and factorization is certainly possible and should be done but left as an exercise):

iptables -t mangle -A PREROUTING -p tcp -i vm+ -m set --match-set networks src -j CONNMARK --restore-mark
# no need to continue evaluation if we already have the mark.
iptables -t mangle -A PREROUTING -m mark ! --mark 0 -j RETURN

# various per-POP rules
iptables -t mangle -A PREROUTING -i e-kiev1 -p tcp --tcp-flags SYN SYN -m set --match-set networks dst -j MARK --set-mark 10001

# after all per-POP rules are done, store the mark to connmark. Required only once: at the end.
iptables -t mangle -A PREROUTING -m mark ! --mark 0 -j CONNMARK --save-mark

Note: with net.netfilter.nf_conntrack_tcp_loose=1 (which is the default anyway) -p tcp --tcp-flags SYN SYN should probably be replaced with -p tcp -m conntrack --ctstate NEW (or even without -p tcp at all to handle properly UDP, ICMP etc...): in case this router reboots (and thus loses flows' memory), it can then pick up on-the-fly established connections as new, and when a packet from "outside" to "inside" is seen again for the first time, proper mark of the flow will be resumed (if first packets for this flow are from "inside" to "outside" they will still be temporarily mis-routed). Currently it can never re-mark properly such flow: there's no SYN anymore in its traffic.

Proper policy routing requires more than the default route

There's a flaw in the routing part: there is a single default route assigned to the alternate routing table. This will make the packets routed back where they came from before they reach the intended target, creating a loop and drop of traffic.

Example following OP's example:

  • New TCP packet with SYN from new flow arrives on interface e-kiev1
  • mangle/PREROUTING sees the packet first and assign a firewall mark 10001 on it
  • routing stack sees the packet
  • its fwmark 10001 which matches the low prio (45) routing rule
  • packet traverses routing table 10001
  • packet matches default route to interface e-kiev1
  • packet is sent back to where it came from without ever reaching the intended target
  • next-hop router (which was actually "previous-hop" router) sees (again) this packet and routes it (again) to the current router
  • LOOP: New TCP packet with SYN from new flow arrives on interface e-kiev1
  • Inherits mark (or is marked again, doesn't matter)
  • (repeat)
  • IPv4's TTL eventually reaches 0: packet dropped

All internal routes in the main table should be duplicated to all additional routing tables: those that don't involve specific anycast point of presence (POP) routers, so that the packet can reach its internal target. If there are additional specific per POP routes in the main table they should of course also be duplicated to the specific additional table (rather than to all additional tables) too.

An other easier method, which might or might not work depending on the complete setup is to first call the main routing table but bypass its default route, then call the additional POP routing tables: this avoids the duplication and thus maintenance burden in each additional routing table by using a single additional routing rule instead for the whole setup.

For this example, one can just add:

ip rule add prio 40 table main suppress_prefixlength 0

There must be no other routing rule (except of course the rule for the local routing table) with a priority less than this rule.

table main suppress_prefixlength 0 means: "if the main table matched with its default route (ie: 0.0.0.0 /0) rather than a more specific route, ignore the result as if there was no matching route yet and continue further routing rules evaluation."

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.