Score:0

Backtesting Historical Logs in fail2ban

tk flag

Setup I'm running apache on an ubuntu server. I've created a fail2ban rule which bans an ip when they request too many pages too fast.

# Fail2ban Rule
failregex = ^.*?(:80|:443) <HOST> - .* "(GET|POST|HEAD).*$
ignoreregex =.*(.ico|.jpg|.png|.gif|.js|.css|.woff|.mp4)

findtime = 30
maxretry = 10

Goal:
I would like to run an old apache log against this new fail2ban rule so i can see if it would have banned any legitimate requests.

Attempt #1 I thought i might be able to use fail2ban-regex to get a list of potentially banned users but it doesn't have that functionality.

Attempt #2 I thought echoing the historical logs into the log which fail2ban is currently watching would make them get parsed. After fixing a small hangup where log lines having old dates were ignored (fixed by adding a year to them) fail2ban started parsing them and banning IPs from it. However, i had only to look at the first banned IP to see that it was wrong. The IP in question had only made 10 requests in total and they weren't anywhere close to each other time-wise. I can only assume that fail2ban isn't using the log line's timestamp to determine validity which makes this testing method a bust.

# echo example
zcat other_vhosts_access.log.8.gz | sed -n 's/\/2022:/\/2032:/p' >> /var/log/apache2/fail2ban_test.log

Conclusion With both of my previous attempts failing i can't think of a sane way to approach this problem. Can somebody recommend a way to achieve what i'm after? Or offer insight into why my second solution isn't working.

Score:0
il flag

Attempt #1

directly seen it hasn't indeed, but...

Although newest versions of fail2ban-regex supports output parameters, so you could do something like this:

fail2ban-client set "$jail" banip $(
   fail2ban-regex -o 'ip' /var/log/path/some.log some-filter | sort --unique | tr '\n' ' '
)

it would be only suitable if you'd find any IPs making a failures regardless the count/time. In your case it'd be senseless at least without some extra preprocessing.

Attempt #2 I thought echoing the historical logs into the log which fail2ban is currently watching would make them get parsed.

It would not work because fail2ban would not really consider the time of message correctly: either it would be too old (if logged unmodified) or it would be incorrect (if now logged as time of failure, because you need to consider maxretry and findtime on real usage). Note to mention that fail2ban would seek to now - findtime by start (because other messages are not interesting to it, since too obsolete), see https://github.com/fail2ban/fail2ban/issues/2909#issuecomment-758036512.

Anyway at the moment, it is hardly possible with stock fail2bans tools out of the box (at least unless this "rescan" facility from RFE above becomes implemented and released).

But since fail2ban (as well as fail2ban-regex) is a module in python, it would be possible with a filter from python writing bans to some log or sending them directly to main fail2ban instance, see https://github.com/fail2ban/fail2ban/issues/2909#issuecomment-1039267423 for such script example.

Also note that your filter is extremely vulnerable and slow, better rewrite it as precise as possible, somehow like here:

failregex = ^"<ADDR>" \S+ \S+ [^"]*"[A-Z]+ /(?:\S+/)*[^\.]*(?:\.(?!ico|jpg|png|gif|js|css|woff|mp4)\w+)? [^"]+"

And last but not least, why you need that at all? If the jail with such filter is active and such crawlers coming back, they will be banned as soon as they make maxretry failures during findtime, configured for the jail. Preventive banning is not really needed and would just bother your net-filter subsystem with a lot of IPs (they would probably never come back again).

tk flag
Thank you, sebres. I could not have asked for a more complete response. I will look into that python script. Your regex improvements are also very much appreciated. As for my reasons, i am not trying to do preventative banning but instead trying to identify false-positives. By re-running old logs i hope to find situations where a user was banned for legitimate usage. I would then modify the failregex to accept them. Running a backtest would provide faster and more accurate results than manual testing.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.