Score:0

Block ChatGPT bot at an nginx level

cn flag

I want to block access for ChatGPT for a whole website, but still allow it to see the robots.txt file located in the public root directory (my robots.txt basically also tells ChatGPT not to crawl the site).

I am blocking it as follows:

# Disallow chatGPT bot
location / {
    if ($http_user_agent ~* "gptbot") {
        return 401;
    }
}

But I want it to be able to access the robots.txt file.

I tried doing:

if ($http_user_agent ~* "GPTBot") {
    if ($request_uri != /robots.txt) {
        return 403;  # Forbidden
    }
}

But it fails.

I also tried:

location / {
    if ($http_user_agent ~* "gptbot") {
        return 401;
    }
}

location = /robots.txt {
    allow all;
    log_not_found off;
    access_log off;
}

And also a variation of this using map:

map $http_user_agent $block_gptbot {
    default         0;
    ~*gptbot        1;
}

server {

location / {
    if ($block_gptbot) {
        return 401;
    }
}

...

But in both cases, the whole site is blocked. Means, gptbot cannot acces anything (it gets the 401), not even robots.txt.

vn flag
Why double it up like this? OpenAI's bot respects robots.txt already.
cn flag
@ceejayoz Makes sense, thanks for the feedback. And out of curiosity and to learn the abstract version of this issue for next time, how would you approach this case?
Richard Smith avatar
jp flag
Allow the `robots.txt` URL using: `location = /robots.txt {}`
cn flag
Thanks @RichardSmith - But this also fails. I updated the question with more specific information.
Richard Smith avatar
jp flag
If "the nginx server doesn't even start" - check the error log and test the configuration using `nginx -t`
cn flag
@RichardSmith Thanks, didn't know about that. There was a duplicate `location /` rule. Now there isn't, and the `location = /robots.txt {}` follows the block rule, but still robots.txt cannot be accessed. The whole site cannot be accessed by gpt bot.
vn flag
In this specific case, isn't that what you want?
cn flag
@ceejayoz In this specific case, I'd like chatgpt to still see the robots.txt. But I will just block the whole site and forget about it. Thanks!
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.