I’m running a WooCommerce/WordPress site and noticed GPTBot has been aggressively crawling one specific URL repeatedly, hitting it approximately once every 10 seconds around the clock. I want to flag this because it looks like a bug in how GPTBot handles dynamic query strings, and it’s causing real server load.
What I’m seeing:
GPTBot continuously requests the same base URL with variations of these query strings:
-
?jet_blog_ajax=1&nocache= -
?nocache=
Each variation has a unique or changing nocache= value, which means GPTBot appears to be treating each request as a distinct URL. This is creating what looks like an infinite crawl loop — it never stops returning to this URL because each response likely contains a new variation of the query string.
The robots.txt problem:
My robots.txt explicitly disallows these patterns for all bots:
Disallow: /*?nocache=
Disallow: /*?*nocache=*
GPTBot is ignoring these rules entirely. This is not a misconfiguration on my end — the rules are valid and other well-behaved crawlers respect them.
Why this matters:
This isn’t just a nuisance. A bot hitting a single endpoint every 10 seconds indefinitely generates unnecessary server load, inflates my bandwidth, and pollutes crawl logs. Multiply this across potentially thousands of sites with similar dynamic query string patterns and this could be a significant infrastructure problem at scale.
Furthermore, this behavior is counterproductive for OpenAI’s own goals. By getting stuck in a loop on a single URL, GPTBot is failing to crawl the rest of my site at all. Any site owner who wants their content indexed for AI training is being poorly served.