Score:0

Is there a way of telling web crawlers / robots about number of requests limit per second / minute / etc

jp flag

I was thinking on a way similar to robots.txt that is used by good bots crawling a website. In the robots.txt I can define the User-agent, Allow and Disallow.

My goal is to pass the message about the request rate limiting to the bots also, saying for example that they are not allowed to go over xxx requests per second, minute, etc.

I know how to put a hard limit in place, but the goal hire is not to block them.

Score:1
us flag

You need to check the bots' home pages for mechanisms to "throttle crawling" (useful search term).

For example, https://developers.google.com/search/docs/crawling-indexing/reduce-crawl-rate is Google's guide how to control Googlebot's crawl rate.

There is also the unofficial Crawl-Delay directive in robots.txt that some bots understand. More details can be found in https://websiteseochecker.com/blog/robots-txt-crawl-delay-why-we-use-crawl-delay-getting-started/.

jp flag
Thanks a lot Tero, very useful.
jp flag
And by the way, I have this 2nd question: https://webmasters.stackexchange.com/questions/141405/crawl-delay-is-there-a-way-to-set-it-below-one-second if you have some input to provide this would be great.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.