I suddenly noticed hundreds of errors on the log file for a website I created that are all caused by openai bots as the HTTP_FROM says gptbot(at)openai.com and USER_AGENT is Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)
I looked into one example and it seems the path created goes way out of bound. This is the path /Odonto-lieyectoresli-541.aspx/assets/js/plugins/Docs/Productos/assets/js/Docs/Productos/assets/js/assets/js/assets/js/vendor/images2021/Docs/Productos/Docs/Productos/assets/js/vendor/Docs/Productos/Docs/Productos/Docs/Menu/Odonto-Gomas-para-pulido-de-composite-tipo-Enhace-1815.aspx
Before blocking the bot I am writing to find out what’s wrong with my website.
Same problem here, with very long URLS with infinite number of &:
20.171.206.213 - - [29/Oct/2024:10:56:16 +0100] “GET /298-bois-chene-moderne?amp%3Bamp%3Border=product.price.desc&%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Design&%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Design&%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Classique+Chic&%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Design&%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Design&%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Design&%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Classique+Chic&%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Design&%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Design&%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Design&%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Classique+Chic&%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Design&%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bamp%3Bq=Style-Design&%3Bamp%3Bq=Style-Design&%3Border=product.name.desc&order=product.name.desc HTTP/1.1” 200 25215 “-” “Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)”
Same here, I took the time to come here and report the problem, but I fear chatgpt developpers wont read us, there is no “BUG REPORT your AI is broken , please fix this” ticketing system . . . If we have no answer qite fast, I ll also block OpenAI . robots.txt or better . . . firewall dropping the whole IPrange NetRange: 20.160.0.0 - 20.175.255.255
CIDR: 20.160.0.0/12
NetName: MSFT
In the end I had no choice , those gptbot attacks were overloading my dedicated server, so I had to reject all gptbot requests :
From my side all errors reported above are not showing up anymore. I don’t know if something was fixed or if the crawler isn’t coming around anymore
I have the same problem - GPTBot/1.2 is hitting nonexistent endpoints on my site thousands of times each day, filling up my logs and depleting my Rollbar credits.
whois says that block is owned by Microsoft, so I suspect OpenAI is running the bot on Azure:
NetRange: 20.160.0.0 - 20.175.255.255
CIDR: 20.160.0.0/12
NetName: MSFT
NetHandle: NET-20-160-0-0-1
Parent: NET20 (NET-20-0-0-0-0)
NetType: Direct Allocation
OriginAS:
Organization: Microsoft Corporation (MSFT)
RegDate: 2017-02-22
Updated: 2017-02-22
Apparently GPTBot scrapes that URL and tries to HTTP “GET” it, which is meaningless in the context of the application. My Rails server was serving a 404 for it.
I’m not clear on why the bot decided to slam those forms with thousands of requests