Score:1

nginx request limit creates 404 responses except for the limited exceeded case

cn flag

Given the following configuration (reduced to the relevant parts):

/etc/nginx/nginx.conf:

http {
  # ... general configuration stuff here ...

  map $http_user_agent $isbot_ua {
    default 0;
    ~*(GoogleBot|bingbot|YandexBot|mj12bot|PetalBot|SemrushBot|AhrefsBot|DotBot|oBot) 1;
  }

  map $isbot_ua $limit_bot {
    0       "";
    1       $binary_remote_addr;
  }

  limit_req_zone $limit_bot zone=bots:10m rate=2r/m;
  limit_req_log_level warn;
  limit_req_status 429;

  include sites.d/vhost_*.conf;
}

/etc/nginx/sites.d/vhost_example.org.conf:

server {
  # ... general vhost config here ...

  location / {
    index index.php index.html index.htm;
    try_files $uri $uri/ /index.php$is_args$args;
  }

  location ~ ^(.+?\.php)(/.*)?$ {
    try_files /does-not-exist-099885c5caef6f8ea25d0ca26594465a.htm @php;
  }

  location @php {
    try_files $1 =404;

    include /etc/nginx/fastcgi_params;
    fastcgi_split_path_info ^(.+\.php)(/.+)\$;
    fastcgi_param SCRIPT_FILENAME $document_root$1;
    fastcgi_param PATH_INFO $2;
    fastcgi_param HTTPS on;
    fastcgi_pass unix:/var/lib/php/11-example.org-php-fpm.socket;
    fastcgi_index index.php;
  }
}

/etc/nginx/fastcgi_params:

limit_req zone=bots burst=5 nodelay;

# ... more fastcgi_param here ...

The issue is the following:

Each request matching a bot UA (no matter if via a virtual URL mapping to /index.php or a native URL pointing directly to index.php) will result in a response code 404 instead of the expected code 200 - unless I'll exceed the rate limit when it suddenly responds with the expected code 429.

No 404 or 429 code is generated, if I change the map to:

  map $request_uri $is_req_limited {
    default 0;
#    ~*(GoogleBot|bingbot|YandexBot|mj12bot|PetalBot|SemrushBot|AhrefsBot|DotBot|oBot) 1;
  }

In this case, all requests are answered with 200. This is also true if I do not match any of the bots.

The thing is: It worked correctly in our pre-deployment tests which had a simpler vhost config (we moved limit_req from the global config to the fastcgi section during deployment because we only want to match page-generation, cached pages and static resources are fine). This totally killed SEO rankings for our sites.

Command used for testing:

# Causes the problem:
for i in $(seq 1 30) do; curl -Is -A GoogleBot https://example.org/ | head -n1; done

# Does not cause the problem:
for i in $(seq 1 30) do; curl -Is -A ThisIsNotABot https://example.org/ | head -n1; done

Is this a bug or a mis-configuration? Is it possible to work-around it if it is a bug?

Side note: It is almost impossible to prevent this somewhat strange configuration because it is generated by the host management software (Froxlor), tho I think it may play into the problem. We also cannot add or modify any configuration here:

location ~ ^(.+?\.php)(/.*)?$ {
  try_files /does-not-exist-099885c5caef6f8ea25d0ca26594465a.htm @php;
}

location @php {
  try_files $1 =404;
  #...

I wonder if limit_req would be better placed inside location ~ ^(.+?\.php)(/.*)?$ but OTOH, location @php should be equally fine.

Paul avatar
cn flag
What is the "host management software"?
hurikhan77 avatar
cn flag
It's Froxlor, I added it to the text
Score:0
br flag

Ouch, this is a tough one especially because the configuration is generated by Froxlor if I understood your story correctly. However, I can try to direct you to the correct direction so you can maybe check-in with the Froxlor devs.

So, depending on my Nginx knowledge. As far as I understood from your question this does not look like a bug. But, more of a configuration/order thingy. Here is why I think so:

When a request comes in /, Nginx first checks the location blocks to see where to route the request to. In your configuration this request is getting routed to the @php location because of your try_file directive. Thus, the limit_req directive applied in the fastcgi_params file is only affecting requests that are handled by FastCGI. But the bots are hitting, as you describe, / which is directly handled by Nginx.

My 2 cents, if you move the limit_req directive to the location / configuration-block like this:

location / {
    index index.php index.html index.htm;
    try_files $uri $uri/ /index.php$is_args$args;
    limit_req zone=bots burst=5 nodelay;
  }

You should notice that all incoming requests, whether they are handled by Nginx or FastCGI, the bots, are subject to rate limiting. And you shouldn't get the sudden weird 404 error anymore.

Check here for more info about: Rate limiting in Nginx

hurikhan77 avatar
cn flag
Yes, the problem is probably somewhere around that configuration. But we explicitly want to limit bots only on generated content, neither cached nor static should be limited. It shouldn't fail into "404" if "limit_req" matches on a bot **before** the limiter triggers. The crazy thing is: It properly returns 429 **after** hitting the limiter. I would understand a "404" **after** hitting the limit because of `try_files $1 =404`.
Bombaci avatar
br flag
Aah okay, now I understand why you initially added it into the `@php` block. This changes the order then. Then you should move `limit_req zone=bots burst=5 nodelay;` into the `@php` block but at an higher level in the block. Try to add it, and test it, before or after the `try_files` directives.
hurikhan77 avatar
cn flag
Sounds like I'll have to patch Froxlor then to achieve that. One more of many patches to fix quirks in that software. If reordering the configs helps, I'll mark your answer as accepted.
Bombaci avatar
br flag
Since I don't have much experience with Froxlor, I did some research for you. The documentation doesn't mention anything about overriding stuff from Nginx. However, I came across an old forum post that said you can switch to "e-mail only" to prevent vhost generation. This will allow you to make changes to the domain vhost. Maybe worth to try? And I am certain that it has to do with the order.
Bombaci avatar
br flag
Specifically this post: [Floxlor Forum Post](https://forum.froxlor.org/index.php?/topic/13156-how-to-change-a-vhosts-configuration/#comment-30597)
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.