Today a direct feature for this doesn't exist with nftables. nftables' raw payload feature is too limited for this. A TCP segment has no direct header information for the data length. It has its header length as @th,96,4 << 2
and that's not useful to reach the data part. iptables' u32
match can do better but still not enough: it can chain computations and get a pointer to the start of the TCP data payload and match their content (ok nftables' very recent @ih
might perhaps do the same). Still, since it can't do subtractions either nor is flexible enough, it can't be used to compute the data payload length but might perhaps have been good enough ("Any access of memory outside [skb->data,skb->end] causes the match to fail.": for data size between 1 and 3, behavior might be undefined).
For what it's worth, tcpdump's BPF can do such computation for IPv4 because it knowns how to do subtractions:
tcpdump 'ip and (tcp[tcpflags] & (tcp-syn|tcp-rst|tcp-ack|tcp-fin) == tcp-syn) and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) > 0)'
Using a lookup table for the IPv4 case
What can't be calculated at runtime can sometimes be pre-calculated and put in a lookup table instead. For IPv4 and nftables, one can build a lookup table with all valid values for the triplet (IP Total Length, IHL, Data Offset) where the TCP data size would be zero, and match (or fail to match) this lookup table.
- possible IP header size in 4-bytes words (IHL): between 5 and 15.
- possible TCP header size in 4-bytes words (Data Offset): between 5 and 15 too.
11x11=121 possibilities where IP length (Total Length) = IHLx4 + DOx4
Note: with IPv6 and its variable number of extra headers between the fixed header and the final protocol (TCP) header, such method can't be used because the lookup table(s) would probably have a very huge size instead of just 121 elements.
tcpsynzero.nft
:
table ip tcpsynzero # for idempotence
delete table ip tcpsynzero # for idempotence
table ip tcpsynzero {
flags dormant
set validtriplet {
typeof ip length . @nh,4,4 . @th,96,4;
}
chain dropsynwithdata {
tcp flags syn ip length . @nh,4,4 . @th,96,4 != @validtriplet counter drop
}
chain prerouting {
type filter hook prerouting priority 0; policy accept;
goto dropsynwithdata
}
chain output {
type filter hook output priority 0; policy accept;
goto dropsynwithdata
}
}
Load first with nft -f tcpsynzero.nft
(table is loaded but dormant thus not enabled, because without the loaded set it would drop all SYNs).
Bash script generatetriplets.bash
to generate nftables commands to populate the set:
#!/bin/bash
for ihl in {5..15}; do
for _do in {5..15}; do
l=$((ihl*4+_do*4))
printf 'add element ip tcpsynzero validtriplet { %d . %#x . %#x }\n' $l $ihl $_do
done
done
Populate set with:
bash generatetriplets.bash | nft -f -
Finally enable the table (redeclaring the table is almost a no-op, except it removes the dormant flag):
nft add table ip tcpsynzero
Test
tested on Linux 6.1.x and nftables 1.0.7, and by (ab)using TCP Fast Open with a few forced modes:
Linux server at 192.0.2.2 where the nftables ruleset was installed.
sysctl -w net.ipv4.tcp_fastopen=0x602
and
socat tcp4-listen:8080,reuseaddr,fork -
Linux client with
sysctl -w net.ipv4.tcp_fastopen=0x5
and
working immediately:
curl http://192.0.2.2:8080/
1st SYN blocked (and retried with normal SYN after 1s):
curl --tcp-fastopen http://192.0.2.2:8080/
Note: testing with the nftables ruleset on the client rather than on the server has no delay because the output verdict of drop triggers an immediate error on the socket but this is immediately retried by the kernel with a normal SYN. The counter in the drop rule is of course still incremented since a packet was still dropped.
Similar workaround used in an other nftables case: Nftables timestamp map