Score:3

Linux TCP payload filter

ve flag

I use nftables, but it hasn't got anything like iptables' string nor u32 match, so it couldn't reliably gather payload offset. If not via nftables' raw payloads, how could I analyze TCP payload efficiently without any user space bottlenecks? (nfqueue is not an answer for performance reasons)

Are there any techniques to do such filtering at the kernel level? Even marking suitable packets is enough for me - the rest of stuff could be easily done at the firewall.

Score:0
cl flag
A.B

There's already kernel code available able to handle arbitrary string match, still in kernel context during packet path, why not reuse it? iptables is not going anywhere. What might maybe disappear someday is the legacy kernel API for iptables, leaving only iptables-nft around, still able to use xtables modules such as the string match module. Using iptables-legacy or iptables-nft along nftables will have the same result below.

It's possible to use marks to pass messages around various network subsystems in the packet path, including from nftables to iptables and then back from iptables to nftables.

This Priority within hook table can help:

nftables Families Typical hooks nft Keyword Value Netfilter Internal Priority Description
[...]
inet, ip, ip6 all mangle -150 NF_IP_PRI_MANGLE Mangle operation
inet, ip, ip6 prerouting dstnat -100 NF_IP_PRI_NAT_DST Destination NAT
inet, ip, ip6, arp, netdev all filter 0 NF_IP_PRI_FILTER Filtering operation, the filter table
[...]

One just needs to register nftables twice for each iptables' built-in chain to be involved: once with a priority just before iptables', once with a priority just after iptables'. For example with a filter/OUTPUT chain with iptables priority 0, one can use -5 and 5 around it. More details in these two Unix/Linux SE Q/A where I made answers.


Artificial example (adapted from an example in iptables-extensions(8)) mixing nftables and iptables, where the system should drop locally initiated DNS requests for the specific DNS address www.netfilter.org, presented in hook priority order (caveat: UDP only, works only without the presence of any IPv4 or IPv6 option/headers in the outgoing packet because of the fixed range, but the range can be relaxed to accomodate for this):

nft add table inet mytable
nft add chain inet mytable outputbefore '{ type filter hook output priority -5; policy accept; }'
nft add rule inet mytable outputbefore udp dport 53 meta mark set 1

iptables -I OUTPUT -m mark --mark 1 -m string --algo bm --from 40 --to 57 --hex-string '|03|www|09|netfilter|03|org|00|' -j MARK --set-mark 2
ip6tables -I OUTPUT -m mark --mark 1 -m string --algo bm --from 60 --to 77 --hex-string '|03|www|09|netfilter|03|org|00|' -j MARK --set-mark 2

nft add chain inet mytable outputafter '{ type filter hook output priority 5; policy accept; }'
nft add rule inet mytable outputafter meta mark 2 drop

The goal here is to have iptables handle only the part that can't be handled by nftables and only do the minimal work: issue a return value through a mark, leaving back nftables in charge of the fate of the packet:

  • nftables sets the packet's mark to 1 to "ask" iptables to perform some work
  • iptables (or ip6tables) performs a string match only if it "received" a mark of 1, to spare CPU use, and "answers" 2 if the string matched in such case
  • nftables drops the packet only if it "received" a mark of value 2 (thus deleting the iptables rules also disables the effect)

Notes:

  • Caveat

    The communication mechanism between nftables and iptables is through packet marks (or could also be through conntrack connmarks). While it's possible to write to and read from only a few bits of the mark (by using the optional mask on the mark and adequate bitwise operations), every users of marks must then respect some allocation convention for the ownership of bits in the mark. Without this, tools will interfere with each others when handling marks. For example firewalld uses marks for handling redirections in rules, so this example might not be compatible with firewalld, even if it uses its own tables when using the nftables backend.

  • Some easy cases can use raw payloads

    This specific example above, with fixed offsets, could have been implemented with nftables. It's when there's an arbitrary offset to find the data (best used with iptables' string match) or a complex method to compute such offset (best used with iptables' u32 match) that nftables can't be used.

    Here's the equivalent replacing all rules above with a single rule using raw payloads. Syntax allows a maximum of 128 bits only, but here 19x8=152 bits are needed so it has to be split into two raw payloads (128 bits + 24 bits). printf, xxd and cut are also used for some help:

    nft add table inet mytable
    nft add chain inet mytable output '{ type filter hook output priority 0; policy accept; }'
    nft add inet mytable output udp dport 53 \
        @th,160,128 0x$(printf '\3%s\11%s\3%s\0' www netfilter org | xxd -p | cut -c-32) \
        @th,288,24 0x$(printf '\3%s\11%s\3%s\0' www netfilter org | xxd -p | cut -c33-) \
        drop
    

    With the output of printf '\3%s\11%s\3%s\0' www netfilter org | xxd -p being:

      03777777096e657466696c746572036f726700
    

    to get the two raw payloads:

    0x03777777096e657466696c746572036f
                                    0x726700
    
  • other methods are probably available

    • iptables can call an eBPF object (anyway only recent kernels allow (finite) loops to try and implement some algorithms easily), nftables lacks this feature. So it's again with a mark and iptables.

    • such eBPF object could be used with XDP or tc but it might be too early in the packet path (eg: no stateful NAT available) and requires actual programming rather than administration. Anyway it's again communication through a mark if decisions have to be handled with nftables.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.