GPTBot makes over 10000 request to my website

TheresaQWQ · April 23, 2025, 12:37am

This endpoint is intended solely for file downloads and is backed by an S3 bucket, not a traditional webpage meant for indexing. However, we’ve observed that GPTBot has been sending an excessive number of requests to this endpoint, which has led to a significant increase in our bandwidth usage and incurred substantial traffic costs. These requests are unnecessary and harmful, as they access non-HTML content and place an unreasonable load on our infrastructure.

curt.kennedy · April 23, 2025, 12:46am

Escalated to OpenAI.

(more words for the algorithm)

colinr · April 23, 2025, 5:13am

GPTBot user agent information is published at https://platform.openai.com/docs/bots, which you can use to restrict crawling in your robots.txt file. Alternatively you could email gptbot at openai.com and let us know which domain you’re referencing.

dr.neil · April 26, 2025, 11:20pm

why is the crawler downloading binary files ?
Is this expected ?

colinr · April 27, 2025, 9:40pm

Web crawlers generally just follow links and don’t know in advance what type of content is going to be served until they access it. If you’d like to disallow crawlers from visiting all or part of your site you can do so using the robots.txt file. OpenAI’s crawler user agents are listed here.

If you have additional questions or concerns about GPTBot please feel free to email us at gptbot (at) openai.com.

jake3 · January 6, 2026, 1:16am

The user agent list link is broken.

GPTBot is hammering our staging site with a request every 3 seconds. It’s found a calender page and keeps making requests with different months and years in the URL parameters, some 800 years in the future. The staging site is not linked from anywhere and only discoverable from the DNS settings. Altogether the antithesis of intelligence.

I’ve disallowed it in robots.txt but the requests keep coming.

This is a denial-of-service attack and should be shut down, using appropriate law enforcement as necessary.

colinr · January 6, 2026, 6:09am

Aside from simple misconfigurations, a common mistake is when sites (or their hosting providers) inadvertently fail to actually serve their robots.txt file to web crawlers. Please share more information, such as a domain name that you’re referring to, to help understand your issue. As mentioned above, you can email gptbot (at) openai.com if you prefer not to share more info here.

jake3 · January 7, 2026, 12:40am

I’ve sent an email with server logs and domain info. I am quite sure the problem is your end not mine. Please fix it.

colinr · January 7, 2026, 12:58am

Thanks Jake, confirmed GPTBot is no longer crawling the site now that it’s serving us content for robots.txt. I’ve replied to your email with more detail.

Foxalabs · January 7, 2026, 9:35am

I just want to add to this thread to say that 10k requests over a few hours to a day is a very normal level of activity for a well configured web crawler, GoogleBot does this kind of number on many of my sites and it’s a good thing, getting indexed for searching means your content is discoverable.

If this is your first website and you are alarmed at the number of connections in a day, these kinds of numbers are quite typical, a denial of service attack is many thousands in a matter of seconds, but even this is normal for a busy website.

My recommendation is to let the crawlers do their thing if you wish to be found when people ask AI and search engines for answers.

Topic		Replies	Views
GPTBot Mass Crawling Truncated URLs Community chatgpt	6	492	January 6, 2026
Bots hitting my website at peak times....a lot API	7	808	January 6, 2026
GPTBot ignoring robots.txt and hammering single URL in a loop — potential infinite crawl bug Community chatgpt	2	78	February 21, 2026
Bots generating errors on my website Community robots	19	2768	April 2, 2025
Gptbot is ddos me and scan my website in a suspect way Plugins / Actions builders cybersecurity	6	1094	February 16, 2025

GPTBot makes over 10000 request to my website

Related topics