GPTBot ignoring robots.txt and hammering single URL in a loop — potential infinite crawl bug

I’m running a WooCommerce/WordPress site and noticed GPTBot has been aggressively crawling one specific URL repeatedly, hitting it approximately once every 10 seconds around the clock. I want to flag this because it looks like a bug in how GPTBot handles dynamic query strings, and it’s causing real server load.

What I’m seeing:

GPTBot continuously requests the same base URL with variations of these query strings:

  • ?jet_blog_ajax=1&nocache=

  • ?nocache=

Each variation has a unique or changing nocache= value, which means GPTBot appears to be treating each request as a distinct URL. This is creating what looks like an infinite crawl loop — it never stops returning to this URL because each response likely contains a new variation of the query string.

The robots.txt problem:

My robots.txt explicitly disallows these patterns for all bots:

Disallow: /*?nocache=
Disallow: /*?*nocache=*

GPTBot is ignoring these rules entirely. This is not a misconfiguration on my end — the rules are valid and other well-behaved crawlers respect them.

Why this matters:

This isn’t just a nuisance. A bot hitting a single endpoint every 10 seconds indefinitely generates unnecessary server load, inflates my bandwidth, and pollutes crawl logs. Multiply this across potentially thousands of sites with similar dynamic query string patterns and this could be a significant infrastructure problem at scale.

Furthermore, this behavior is counterproductive for OpenAI’s own goals. By getting stuck in a loop on a single URL, GPTBot is failing to crawl the rest of my site at all. Any site owner who wants their content indexed for AI training is being poorly served.

It’s entirely plausible that someone is pretending to be GPTBot. Have you checked the IP Address(es) attached to the requests?

You can match it here, and in the future ensure that any user agent as GPTBOT is valid

https://openai.com/gptbot.json

Lastly, you could be more explicit and include this in your robots.txt (not sure if this would help)

User-agent: GPTBot
Disallow: /*?*nocache=*