Score:4

What is the difference BM and KMP algorithms in iptables string search?

in flag

Iptables has a string module for searching packet contents. But it forces you to choose between one these algorithms: Boyer-Moore (bm) and Knuth-Pratt-Morris (kmp). I couldn't find any good documentation on which cases should I use one algorithm over the other.

Which one is better?

The other issue I found is that BM sometimes misses packets. I've found reports of people having the same issue.

# iptables -A OUTPUT -m string --string teststring --algo bm
# iptables -A OUTPUT -m string --string teststring --algo kmp

# wget -O/dev/null http://www.example.com/teststring

# iptables -nvL OUTPUT
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0            0    --  *      *       0.0.0.0/0            0.0.0.0/0            STRING match  "teststring" ALGO name bm
    1   192            0    --  *      *       0.0.0.0/0            0.0.0.0/0            STRING match  "teststring" ALGO name kmp

As you can see, the BM rule didn't match despite being placed first.

Score:4
cl flag
A.B

In this answer I'm leaving aside the mathematics/complexity considerations (one algorithm might be more adequate depending on the type of problem and size of the pattern) that should normally have been the only considerations to have.

Various references: 1, 2, 3.


The Linux's xtables implementation has recently (2023-06) received two patches coming after a longstanding (2019-12) bug report: Bug 1390 - iptables -m string not working with --algo bm and OUTPUT chain under 5.3.x

Under 5.3.x, iptables -A OUTPUT -p tcp -m string --algo bm --string POST -j DROP does not drop outgoing packets containing "POST". This command was instead working as intended with 5.0.0.

one part was a bug that got fixed, but the other part is a known limitation since the algorithm is available.

Thus an iptables-extensions documentation update to get more attention:

man: string: document BM false negatives

For non-linear skb's there's a possibility that the kernel's Boyer-Moore text-search implementation may miss matches. There's a warning about this in the kernel source. Include that warning in the man-page.

It's the same warning as the one present in kernel sources (since 2005):

 *   Note: Since Boyer-Moore (BM) performs searches for matchings from right 
 *   to left, it's still possible that a matching could be spread over 
 *   multiple blocks, in that case this algorithm won't find any coincidence.
 *   
 *   If you're willing to ensure that such thing won't ever happen, use the
 *   Knuth-Pratt-Morris (KMP) implementation instead. In conclusion, choose 
 *   the proper string search algorithm depending on your setting. 
 *
 *   Say you're using the textsearch infrastructure for filtering, NIDS or 
 *   any similar security focused purpose, then go KMP. Otherwise, if you 
 *   really care about performance, say you're classifying packets to apply
 *   Quality of Service (QoS) policies, and you don't mind about possible
 *   matchings spread over multiple fragments, then go BM.
 */

So in presence of a non-linear skbuff (the kernel object handling any packet, where data can sometimes be split in multiples memory blocks), possibly due to acceleration available in NIC and drivers, possibly with features like TCP Segmentation Offload or various tunnel offload features, etc., the BM algorithm could fail to find a result when KMP will find one.

The warning tells:

  • correctness? stick to KMP
  • performance? you can use BM
A.B avatar
cl flag
A.B
Btw, the bug (increasing chances of not finding with non-linear skbuffs since only the first data block, like the packet's header, was completely searched)'s fix might not have been available everywhere yet. It's there: https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git/commit/?id=6f67fbf8192da80c4db01a1800c7fceaca9cf1f9 . For example the [6.1.y fix](https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-6.1.y&id=cfcb9f0a499dafadcbfe43b1c587f20e4e953b19) is in 6.1.39 (2023-07-19)
Score:1
as flag

The Boyer-Moore (BM) and Knuth-Pratt-Morris (KMP) algorithms are both string search algorithms used for pattern matching. In the context of the iptables string module, they are used to search for specific patterns within packet contents. However, the choice between these two algorithms might not be as straightforward as one being definitively "better" than the other.

Here's a brief overview of both algorithms and their differences:

  1. Boyer-Moore (BM) Algorithm:

    • The BM algorithm is known for its efficient handling of mismatches. It preprocesses the pattern to create two tables: the "bad character" table and the "good suffix" table.
    • BM scans the packet content from right to left, matching the pattern against the content. It shifts the pattern by a larger amount based on the information in the two tables, thus skipping unnecessary comparisons.
    • BM is particularly effective for longer patterns and cases where the alphabet (set of characters in the pattern) is relatively small.
  2. Knuth-Pratt-Morris (KMP) Algorithm:

    • The KMP algorithm focuses on efficiently handling partial matches. It preprocesses the pattern to create a "failure" table that guides the pattern shift when a mismatch occurs.
    • KMP scans the packet content from left to right, adjusting its position based on the information in the failure table to avoid rechecking previously matched characters.
    • KMP is often preferred when the pattern has repetitive elements or a small alphabet.

Regarding your observation that the BM rule didn't match despite being placed first, there could be several reasons for this behavior, including the specifics of your environment and the characteristics of the pattern and packet content. The order of rules in an iptables chain can matter, and if a previous rule matches and terminates the packet processing, subsequent rules might not be evaluated.

It's important to note that neither algorithm is universally superior; their performance can vary depending on the specifics of the pattern, packet content, and other factors. In some cases, BM might perform better, while in others, KMP might be more efficient.

If you're experiencing issues with the BM algorithm missing packets, it could be due to specific characteristics of your patterns or packet contents that are affecting its performance. It's a good idea to analyze the patterns you're searching for, the content of the packets, and the system's overall configuration to determine the most suitable algorithm for your use case.

In practice, you might need to experiment with both algorithms and possibly other options to find the best solution for your specific scenario.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.