In this answer I'm leaving aside the mathematics/complexity considerations (one algorithm might be more adequate depending on the type of problem and size of the pattern) that should normally have been the only considerations to have.
Various references: 1, 2, 3.
The Linux's xtables implementation has recently (2023-06) received two patches coming after a longstanding (2019-12) bug report: Bug 1390 - iptables -m string not working with --algo bm and OUTPUT chain under 5.3.x
Under 5.3.x, iptables -A OUTPUT -p tcp -m string --algo bm --string POST -j DROP
does not drop outgoing packets containing "POST". This
command was instead working as intended with 5.0.0.
one part was a bug that got fixed, but the other part is a known limitation since the algorithm is available.
Thus an iptables-extensions documentation update to get more attention:
man: string: document BM false negatives
For non-linear skb's there's a possibility that the kernel's
Boyer-Moore text-search implementation may miss matches. There's a
warning about this in the kernel source. Include that warning in the
man-page.
It's the same warning as the one present in kernel sources (since 2005):
* Note: Since Boyer-Moore (BM) performs searches for matchings from right
* to left, it's still possible that a matching could be spread over
* multiple blocks, in that case this algorithm won't find any coincidence.
*
* If you're willing to ensure that such thing won't ever happen, use the
* Knuth-Pratt-Morris (KMP) implementation instead. In conclusion, choose
* the proper string search algorithm depending on your setting.
*
* Say you're using the textsearch infrastructure for filtering, NIDS or
* any similar security focused purpose, then go KMP. Otherwise, if you
* really care about performance, say you're classifying packets to apply
* Quality of Service (QoS) policies, and you don't mind about possible
* matchings spread over multiple fragments, then go BM.
*/
So in presence of a non-linear skbuff (the kernel object handling any packet, where data can sometimes be split in multiples memory blocks), possibly due to acceleration available in NIC and drivers, possibly with features like TCP Segmentation Offload or various tunnel offload features, etc., the BM algorithm could fail to find a result when KMP will find one.
The warning tells:
- correctness? stick to KMP
- performance? you can use BM