I’m trying to allow OpenAIs bots to access content on our site but we can only do so via the robots.txt or server side based on user agents. These are easy to immitate though which is why we’d prefer to reverse proxy / create an allow list for the OpenAI IP space to crawl.
The only reason why we’re blocking OpenAI currently is because we can’t verify their bots IP.
Update 24.06.2025: I found the list:
They’re hosted on https://openai.com
/chatgpt-user.json
/searchbot.json
/gptbot.json
Thanks,
Julian