Score:1

Whitelist cgroup from wireguard VPN

ru flag

I have a wireguard VPN, setup and enabled through NetworkManager, called wg0. I want to allow a program to access the internet directly without going through the tunnel. For this I’m trying to match by cgroupv2

Here’s what the routing looks like:

> ip -4 addr show dev wlan0
2: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 10.126.232.253/16 brd 10.126.255.255 scope global dynamic noprefixroute wlan0
       valid_lft 43135sec preferred_lft 43135sec

> ip -4 addr show dev wg0
6: wg0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 10.118.53.99/32 scope global noprefixroute wg0
       valid_lft forever preferred_lft forever

> ip -4 rule
0:      from all lookup local
31760:  from all lookup main suppress_prefixlength 0
31761:  not from all fwmark 0xcc4d lookup 52301
32766:  from all lookup main
32767:  from all lookup default

> ip -4 route list table local
local 10.118.53.99 dev wg0 proto kernel scope host src 10.118.53.99
local 10.126.232.253 dev wlan0 proto kernel scope host src 10.126.232.253
broadcast 10.126.255.255 dev wlan0 proto kernel scope link src 10.126.232.253
local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1
local 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1
broadcast 127.255.255.255 dev lo proto kernel scope link src 127.0.0.1

> ip -4 route list table main
default via 10.126.255.254 dev wlan0 proto dhcp src 10.126.232.253 metric 600
10.126.0.0/16 dev wlan0 proto kernel scope link src 10.126.232.253 metric 600

> ip -4 route list table 52301
default dev wg0 proto static scope link metric 20050

Now from reading up on wg-quick routing and trying to understand the Netfilter packet flow, I have created a table called "bypass" that matches my terminal’s cgroup and adds the fwmark 0xcc4d:

> sudo nft list tables
table inet bypass

> sudo nft list table inet bypass
table inet bypass {
        chain out {
                type filter hook output priority mangle; policy accept;
                socket cgroupv2 level 5 "user.slice/user-1000.slice/[email protected]/app.slice/app-org.kde.konsole-698af79930294eb09aa9231b0a8bd258.scope" log prefix "cgroup_matched out "
                socket cgroupv2 level 5 "user.slice/user-1000.slice/[email protected]/app.slice/app-org.kde.konsole-698af79930294eb09aa9231b0a8bd258.scope" meta mark 0x00000000 meta oiftype != loopback meta mark set 0x0000cc4d
        }

        chain check {
                type filter hook postrouting priority mangle; policy accept;
                ip daddr 34.160.111.145 log prefix "postroute check "
        }
}

> cat /proc/$$/cgroup
0::/user.slice/user-1000.slice/[email protected]/app.slice/app-org.kde.konsole-698af79930294eb09aa9231b0a8bd258.scope

Testing it out, it seems packets are matched correctly, but from the netfilter graph

> host ifconfig.me
34.160.111.145

> sudo conntrack -L -d 34.160.111.145
conntrack v1.4.7 (conntrack-tools): 0 flow entries have been shown.

> d=`date +'%Y-%m-%d %H:%M:%S'`

> curl https://ifconfig.me; echo
<tunnel public IP>  # Expect to see my own public IP

> sudo journalctl --dmesg --since "$d" -n 2
Apr 14 12:40:56 <hostname> kernel: cgroup_matched out IN= OUT=wg0 SRC=10.118.53.99 DST=34.160.111.145 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=26291 DF PROTO=TCP SPT=48504 DPT=443 WINDOW=64860 RES=0x00 SYN URGP=0
Apr 14 12:40:56 <hostname> kernel: postroute check IN= OUT=wg0 SRC=10.118.53.99 DST=34.160.111.145 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=26291 DF PROTO=TCP SPT=48504 DPT=443 WINDOW=64860 RES=0x00 SYN URGP=0 MARK=0xcc4d

As I understand this, the fwmark 0xcc4d is correctly to outgoing packets at mangle, which we see at postroute. However at postroute, i.e. after the reroute check on the netfilter flow diagram, we still have OUT=wg0 SRC=10.118.53.99, though as I understand it:

  • changing fwmark should in mangle should trigger the reroute, see this thread, linux ipt mangle source code
  • Anything marked fwmark 0xcc4d should not lookup table 52301 and therefore use the route default via 10.126.255.254 dev wlan0 proto dhcp src 10.126.232.253

So what am I missing, and how can I fix this?


I’m running OpenSUSE Tumbleweed (20230411) with:

  • Linux kernel 6.2.9
  • libnftables1 1.0.7
  • iproute2 6.2
  • wireguard-tools 1.0.20210914
  • conntrack-tools 1.4.7
  • NetworkManager 1.42.4
Score:1
ru flag

I’ve gotten to a solution that seems to work:

  • To change the output device the nftables rule needs to be type route. However this does not change the source IP, which stays as the tunnel’s private address
  • To change this address I’ve added a nat rule that applies a masquerade to change the source IP
table inet bypass {
        chain in {
                type filter hook prerouting priority raw; policy accept;
                socket cgroupv2 level 5 "user.slice/user-1000.slice/[email protected]/app.slice/app-org.kde.konsole-aed769efd3b74de792530f9a71b0c14b.scope" meta mark 0x00000000 meta oiftype != loopback meta mark set 0x0000cc4d
        }

        chain out {
                type route hook output priority mangle; policy accept;
                socket cgroupv2 level 5 "user.slice/user-1000.slice/[email protected]/app.slice/app-org.kde.konsole-aed769efd3b74de792530f9a71b0c14b.scope" meta mark 0x00000000 meta oiftype != loopback meta mark set 0x0000cc4d
        }

        chain nat {
                type nat hook postrouting priority srcnat; policy accept;
                meta oif wlan0 meta mark 0x0000cc4d masquerade
        }
}

I’m unsure whether this is a proper solution or hacky.

I think the setup may also benefit from using separate conntrack zones for tunneled and non-tunneled traffic as they can target the same IPs.

A.B avatar
cl flag
A.B
As soon as a (Netfilter's) reroute and a fwmark are involved, as you discovered, the source address will be wrong and a NAT bandaid is needed. The only way I know to avoid this is to not be in this situation in the first place: not having to rely on Netfilter and/or not having to rely on the output hook. I could imagine in theory that relying on socket marks set with an enforced ebpf filter on the socket could work ( https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8fd682072335e98b53823c89efa4d2460e79a3d5 ) but I have no idea on how to implement this.
A.B avatar
cl flag
A.B
For testing purpose only without requiring eBPF, for a privileged process only (that can use setsockopt (..., SO_MARK,...) so not for user 1000), this could be done with a wrapper (LD_PRELOAD...) that sets the socket mark whenever socket(2) and possibly other related system calls are done. The goal is to have the mark available *before* the first routing check.
Cimbali avatar
ru flag
Thanks @A.B for confirming NAT is needed (I’ll mark this as accepted then) and interesting further reading.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.