WAF protection on https://openai.com/gptbot-ranges.txt - 403 Forbidden

I am not sure if this is a right place to raise up such question. However I was not able to get any help from chat specialist.

We are facing 403 Forbidden answers while checking https://openai.com/gptbot-ranges.txt endpoint.
We are using a custom script to periodically check and update our firewall rules based on the answer.
Few days ago we started to receive consecutive 403 Forbidden answers.
It looks like this endpoint is protected with Cloudflare service. If I generate a request from the same source IP - but from webbrowser - this traffic is allowed and I’m getting a 200 OK reply with a list of IPs used by openai.

I believe that automatic traffic should be allowed to this endpoint. Thus - it should be whitelisted on Cloudflare to not block requests made by scripts.
Or maybe there is an option to contact with a team managing WAF (Cloudflare) so I can provide more details about this issue?

Here is example of such blocked request.

curl -I https://openai.com/gptbot-ranges.txt
HTTP/2 403
date: Tue, 28 May 2024 11:38:45 GMT
content-type: text/html; charset=UTF-8
content-length: 15734
accept-ch: Sec-CH-UA-Bitness, Sec-CH-UA-Arch, Sec-CH-UA-Full-Version, Sec-CH-UA-Mobile, Sec-CH-UA-Model, Sec-CH-UA-Platform-Version, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Platform, Sec-CH-UA, UA-Bitness, UA-Arch, UA-Full-Version, UA-Mobile, UA-Model, UA-Platform-Version, UA-Platform, UA
critical-ch: Sec-CH-UA-Bitness, Sec-CH-UA-Arch, Sec-CH-UA-Full-Version, Sec-CH-UA-Mobile, Sec-CH-UA-Model, Sec-CH-UA-Platform-Version, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Platform, Sec-CH-UA, UA-Bitness, UA-Arch, UA-Full-Version, UA-Mobile, UA-Model, UA-Platform-Version, UA-Platform, UA
cross-origin-embedder-policy: require-corp
cross-origin-opener-policy: same-origin
cross-origin-resource-policy: same-origin
origin-agent-cluster: ?1
permissions-policy: accelerometer=(),autoplay=(),browsing-topics=(),camera=(),clipboard-read=(),clipboard-write=(),geolocation=(),gyroscope=(),hid=(),interest-cohort=(),magnetometer=(),microphone=(),payment=(),publickey-credentials-get=(),screen-wake-lock=(),serial=(),sync-xhr=(),usb=()
referrer-policy: same-origin
x-content-options: nosniff
x-frame-options: SAMEORIGIN
cf-mitigated: challenge
cf-chl-out: IK8jVAWCs5ca3AzGbBqEJuz1Cxp4Tf+lCxYrWvpVRBo33iQeBsNYPjLa7WusXE210JapP4GansWlVt9g0mVSGMaDXkjWyyk5Mavlo6jJloPTasufLdWTmx6StkLek+P72mRPgHVZRM3tplpdaU6aiw==$7CVru0JZZJ6T9/XBgtHGpA==
cache-control: private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0
expires: Thu, 01 Jan 1970 00:00:01 GMT
set-cookie: __cf_bm=u5tZ_deRdYemMAUEqbH2xlV1OhQdMqcJcjZQf4jKiGQ-1716896325-1.0.1.1-uVQIrq370odBpIOQi8iIruiB4BwI.C.5ap_hoORoZ81GX8Rm3HDwplWlyNB4fj0Wsx1_3DW73Dc2Gpd5sbGI.w; path=/; expires=Tue, 28-May-24 12:08:45 GMT; domain=.openai.com; HttpOnly; Secure; SameSite=None
x-content-type-options: nosniff
server: cloudflare
cf-ray: 88adf5d23e41b1a0-WAW
alt-svc: h3=":443"; ma=86400

And here is the one which is ok:

curl -I 'https://openai.com/gptbot-ranges.txt' \
  -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7' \
  -H 'accept-language: pl-PL,pl;q=0.9,en-US;q=0.8,en;q=0.7' \
  -H 'cache-control: no-cache' \
  -H 'pragma: no-cache' \
  -H 'priority: u=0, i' \
  -H 'sec-ch-ua: "Chromium";v="124", "Google Chrome";v="124", "Not-A.Brand";v="99"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  -H 'sec-fetch-dest: document' \
  -H 'sec-fetch-mode: navigate' \
  -H 'sec-fetch-site: none' \
  -H 'sec-fetch-user: ?1' \
  -H 'upgrade-insecure-requests: 1' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36'
HTTP/2 200
date: Tue, 28 May 2024 11:39:43 GMT
content-type: text/plain; charset=utf-8
content-length: 32
accept-ranges: bytes
access-control-allow-origin: *
age: 275360
cache-control: public, max-age=0, must-revalidate
content-disposition: inline; filename="gptbot-ranges.txt"
etag: "4635b1edcbc0e2f3239222285f16af75"
strict-transport-security: max-age=63072000
x-matched-path: /gptbot-ranges.txt
x-vercel-cache: HIT
x-vercel-id: arn1::fc92r-1716896383784-3f538073c369
cf-cache-status: DYNAMIC
set-cookie: __cf_bm=XRE5IaUEElGa.14hRahQZ3FuYcP.2mgrzdN83W_zlgk-1716896383-1.0.1.1-wj_RQubqeagMVngfNtMl6PouL4W5mN0iTSwyAYAhOyrXh5ou6puKYT9pZdeazCqf8iA6z60BADaEIaQ2QUCcMQ; path=/; expires=Tue, 28-May-24 12:09:43 GMT; domain=.openai.com; HttpOnly; Secure; SameSite=None
x-content-type-options: nosniff
set-cookie: _cfuvid=293tOfg1S8LGIM2TU87JoLNWGD.s97KV9xFNU6vOrq0-1716896383801-0.0.1.1-604800000; path=/; domain=.openai.com; HttpOnly; Secure; SameSite=None
server: cloudflare
cf-ray: 88adf73e6bb7b247-WAW
alt-svc: h3=":443"; ma=86400

Here’s a get-a-round using a human’s Firefox:

from selenium import webdriver
from selenium.webdriver.firefox.options import Options

url = "https://openai.com/gptbot-ranges.txt"

# Setup headless Firefox browser
options = Options()
options.headless = True

# Replace with the path to your geckodriver
driver = webdriver.Firefox(options=options,)

try:
    driver.get(url)
    content = driver.page_source
    print(content)
finally:
    driver.quit()

Gets HTML despite the plain appearance:

<html><head><link rel="stylesheet" href="resource://content-accessible/plaintext.css"></head><body><pre>52.230.152.0/24
52.233.106.0/24
</pre></body></html>

You might have to navigate to platform.openai.com to get all cookied up first.

Hi,

Thanks for your reply. Yeah it will work too. However I still believe that automatic traffic should be whitelisted on this endpoint. Thanks to this anyone will be able to automate it without complicating it too much.
Does anyone can share a contact to network or security team from OpenAI?

we’re having this same issue. @OpenAI can you please add this endpoint to an allowlist or un-restrict it?