There are two issues: handling marks properly, and missing routes.
mark and connmark
The routing stack knows about the firewall mark (aka fwmark aka mark). It doesn't know anything about Netfilter's connmark. This has an effect on the first SYN packet.
The ruleset must take steps to move the value from the flow to the packet when needed. Here it does the copy from the flow's connmark to the packet's mark, but at the initial step when encountering a SYN packet, the 2nd rule creates directly a flow connmark without ever marking the packet itself with a mark: the SYN packet will not be evaluated differently by the routing stack right after since it doesn't have a mark, while all other packets of this flow will be since they then inherit the connmark as mark.
iptables rules should be changed like this (further optimization and proper simplification and factorization is certainly possible and should be done but left as an exercise):
iptables -t mangle -A PREROUTING -p tcp -i vm+ -m set --match-set networks src -j CONNMARK --restore-mark
# no need to continue evaluation if we already have the mark.
iptables -t mangle -A PREROUTING -m mark ! --mark 0 -j RETURN
# various per-POP rules
iptables -t mangle -A PREROUTING -i e-kiev1 -p tcp --tcp-flags SYN SYN -m set --match-set networks dst -j MARK --set-mark 10001
# after all per-POP rules are done, store the mark to connmark. Required only once: at the end.
iptables -t mangle -A PREROUTING -m mark ! --mark 0 -j CONNMARK --save-mark
Note: with net.netfilter.nf_conntrack_tcp_loose=1
(which is the default anyway) -p tcp --tcp-flags SYN SYN
should probably be replaced with -p tcp -m conntrack --ctstate NEW
(or even without -p tcp
at all to handle properly UDP, ICMP etc...): in case this router reboots (and thus loses flows' memory), it can then pick up on-the-fly established connections as new, and when a packet from "outside" to "inside" is seen again for the first time, proper mark of the flow will be resumed (if first packets for this flow are from "inside" to "outside" they will still be temporarily mis-routed). Currently it can never re-mark properly such flow: there's no SYN anymore in its traffic.
Proper policy routing requires more than the default route
There's a flaw in the routing part: there is a single default route assigned to the alternate routing table. This will make the packets routed back where they came from before they reach the intended target, creating a loop and drop of traffic.
Example following OP's example:
- New TCP packet with SYN from new flow arrives on interface e-kiev1
- mangle/PREROUTING sees the packet first and assign a firewall mark 10001 on it
- routing stack sees the packet
- its fwmark 10001 which matches the low prio (45) routing rule
- packet traverses routing table 10001
- packet matches default route to interface e-kiev1
- packet is sent back to where it came from without ever reaching the intended target
- next-hop router (which was actually "previous-hop" router) sees (again) this packet and routes it (again) to the current router
- LOOP: New TCP packet with SYN from new flow arrives on interface e-kiev1
- Inherits mark (or is marked again, doesn't matter)
- (repeat)
- IPv4's TTL eventually reaches 0: packet dropped
All internal routes in the main table should be duplicated to all additional routing tables: those that don't involve specific anycast point of presence (POP) routers, so that the packet can reach its internal target. If there are additional specific per POP routes in the main table they should of course also be duplicated to the specific additional table (rather than to all additional tables) too.
An other easier method, which might or might not work depending on the complete setup is to first call the main routing table but bypass its default route, then call the additional POP routing tables: this avoids the duplication and thus maintenance burden in each additional routing table by using a single additional routing rule instead for the whole setup.
For this example, one can just add:
ip rule add prio 40 table main suppress_prefixlength 0
There must be no other routing rule (except of course the rule for the local routing table) with a priority less than this rule.
table main suppress_prefixlength 0
means: "if the main table matched with its default route (ie: 0.0.0.0 /0) rather than a more specific route, ignore the result as if there was no matching route yet and continue further routing rules evaluation."