Bots generating errors on my website

I suddenly noticed hundreds of errors on the log file for a website I created that are all caused by openai bots as the HTTP_FROM says gptbot(at)openai.com and USER_AGENT is Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)

I looked into one example and it seems the path created goes way out of bound. This is the path /Odonto-lieyectoresli-541.aspx/assets/js/plugins/Docs/Productos/assets/js/Docs/Productos/assets/js/assets/js/assets/js/vendor/images2021/Docs/Productos/Docs/Productos/assets/js/vendor/Docs/Productos/Docs/Productos/Docs/Menu/Odonto-Gomas-para-pulido-de-composite-tipo-Enhace-1815.aspx

Before blocking the bot I am writing to find out what’s wrong with my website.

1 Like

Same problem here, with very long URLS with infinite number of &amp:
20.171.206.213 - - [29/Oct/2024:10:56:16 +0100] “GET /298-bois-chene-moderne?amp%3Bamp%3Border=product.price.desc&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Design&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Design&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Classique+Chic&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Design&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Design&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Design&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Classique+Chic&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Design&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Design&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Design&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Classique+Chic&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Design&amp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Design&amp%3Bamp%3Bq=Style-Design&amp%3Border=product.name.desc&order=product.name.desc HTTP/1.1” 200 25215 “-” “Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)”

Please FIX THIS or stop crawling.

Glad to see I am not the only one.

More than 4000 errors today, I am considering blocking the bot thru robots.txt

Same here, I took the time to come here and report the problem, but I fear chatgpt developpers wont read us, there is no “BUG REPORT your AI is broken , please fix this” ticketing system . . . If we have no answer qite fast, I ll also block OpenAI . robots.txt or better . . . firewall dropping the whole IPrange NetRange: 20.160.0.0 - 20.175.255.255
CIDR: 20.160.0.0/12
NetName: MSFT

In the end I had no choice , those gptbot attacks were overloading my dedicated server, so I had to reject all gptbot requests :

ufw insert 1 deny from 20.171.206.0/24

for now. until openAI fix the buggy pseudo AI.

1 Like

From my side all errors reported above are not showing up anymore. I don’t know if something was fixed or if the crawler isn’t coming around anymore

2 Likes

Thanks for coming back to let us know!

1 Like

Hi same thing form me i’m in france,
the bot come every 3 days and my website show error 500…

1 Like

Estou com o mesmo problema.
mesmo bloqueando pelo robots.txt, ainda continua a derrubar meu site.
agora vou aguardar 24 horas para ver.

/midias/?amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;page=6&page=4&page=6&page=6&page=5&page=6&page=5&page=3&page=3&page=5&page=4&orgao=8&page=5

1 Like
# Bloquear IPs e faixas de IP
RewriteCond %{REMOTE_ADDR} ^52\.230\.152\. [OR]
RewriteCond %{REMOTE_ADDR} ^52\.233\.106\. [OR]
RewriteCond %{REMOTE_ADDR} ^20\.171\.206\. [OR]
RewriteCond %{REMOTE_ADDR} ^20\.171\.207\. [OR]
RewriteCond %{REMOTE_ADDR} ^4\.227\.36\.([0-9]|[0-9][0-9]|1[0-1][0-9]|12[0-5])$ [OR]
RewriteCond %{REMOTE_ADDR} ^20\.42\.10\.1(7[6-9]|8[0-3])$ [OR]
RewriteCond %{REMOTE_ADDR} ^172\.203\.190\.1(2[8-9]|3[0-9])$ [OR]
RewriteCond %{REMOTE_ADDR} ^51\.8\.102\. [OR]
RewriteCond %{REMOTE_ADDR} ^189\.89\.12\.150$
RewriteRule ^ - [F,L]

looks like someone understood how to use chatgpt to ddos websites, and no one cares at (Open)ClosedAi.

I have the same problem - GPTBot/1.2 is hitting nonexistent endpoints on my site thousands of times each day, filling up my logs and depleting my Rollbar credits.

Are you sure it is GPTBot? Did you check the IP?

Good question.

Yeah, the IP is OpenAI.

It was 20.171.207.223, which is listed in https://openai.com/gptbot.json (as “20.171.207.0/24”)

whois says that block is owned by Microsoft, so I suspect OpenAI is running the bot on Azure:
NetRange: 20.160.0.0 - 20.175.255.255
CIDR: 20.160.0.0/12
NetName: MSFT
NetHandle: NET-20-160-0-0-1
Parent: NET20 (NET-20-0-0-0-0)
NetType: Direct Allocation
OriginAS:
Organization: Microsoft Corporation (MSFT)
RegDate: 2017-02-22
Updated: 2017-02-22

And the pages never existed?

1 Like

Let me be more precise - the “GET” endpoint never existed, only “POST” endpoints for forms.

Users (and GPTBot) are served form code that begins like this:

<form data-turbo="true" method="post" action="/twitter/twitter_user/261377/upvote">

Apparently GPTBot scrapes that URL and tries to HTTP “GET” it, which is meaningless in the context of the application. My Rails server was serving a 404 for it.

I’m not clear on why the bot decided to slam those forms with thousands of requests

@vb could you look into this?

Thanks for flagging this, we’re looking into it. We do monitor gptbot(at)openai.com if you’d like to reach out directly.

3 Likes