I am not sure if this is a right place to raise up such question. However I was not able to get any help from chat specialist.
We are facing 403 Forbidden answers while checking https://openai.com/gptbot-ranges.txt endpoint.
We are using a custom script to periodically check and update our firewall rules based on the answer.
Few days ago we started to receive consecutive 403 Forbidden answers.
It looks like this endpoint is protected with Cloudflare service. If I generate a request from the same source IP - but from webbrowser - this traffic is allowed and I’m getting a 200 OK reply with a list of IPs used by openai.
I believe that automatic traffic should be allowed to this endpoint. Thus - it should be whitelisted on Cloudflare to not block requests made by scripts.
Or maybe there is an option to contact with a team managing WAF (Cloudflare) so I can provide more details about this issue?
Here is example of such blocked request.
curl -I https://openai.com/gptbot-ranges.txt
HTTP/2 403
date: Tue, 28 May 2024 11:38:45 GMT
content-type: text/html; charset=UTF-8
content-length: 15734
accept-ch: Sec-CH-UA-Bitness, Sec-CH-UA-Arch, Sec-CH-UA-Full-Version, Sec-CH-UA-Mobile, Sec-CH-UA-Model, Sec-CH-UA-Platform-Version, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Platform, Sec-CH-UA, UA-Bitness, UA-Arch, UA-Full-Version, UA-Mobile, UA-Model, UA-Platform-Version, UA-Platform, UA
critical-ch: Sec-CH-UA-Bitness, Sec-CH-UA-Arch, Sec-CH-UA-Full-Version, Sec-CH-UA-Mobile, Sec-CH-UA-Model, Sec-CH-UA-Platform-Version, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Platform, Sec-CH-UA, UA-Bitness, UA-Arch, UA-Full-Version, UA-Mobile, UA-Model, UA-Platform-Version, UA-Platform, UA
cross-origin-embedder-policy: require-corp
cross-origin-opener-policy: same-origin
cross-origin-resource-policy: same-origin
origin-agent-cluster: ?1
permissions-policy: accelerometer=(),autoplay=(),browsing-topics=(),camera=(),clipboard-read=(),clipboard-write=(),geolocation=(),gyroscope=(),hid=(),interest-cohort=(),magnetometer=(),microphone=(),payment=(),publickey-credentials-get=(),screen-wake-lock=(),serial=(),sync-xhr=(),usb=()
referrer-policy: same-origin
x-content-options: nosniff
x-frame-options: SAMEORIGIN
cf-mitigated: challenge
cf-chl-out: IK8jVAWCs5ca3AzGbBqEJuz1Cxp4Tf+lCxYrWvpVRBo33iQeBsNYPjLa7WusXE210JapP4GansWlVt9g0mVSGMaDXkjWyyk5Mavlo6jJloPTasufLdWTmx6StkLek+P72mRPgHVZRM3tplpdaU6aiw==$7CVru0JZZJ6T9/XBgtHGpA==
cache-control: private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0
expires: Thu, 01 Jan 1970 00:00:01 GMT
set-cookie: __cf_bm=u5tZ_deRdYemMAUEqbH2xlV1OhQdMqcJcjZQf4jKiGQ-1716896325-1.0.1.1-uVQIrq370odBpIOQi8iIruiB4BwI.C.5ap_hoORoZ81GX8Rm3HDwplWlyNB4fj0Wsx1_3DW73Dc2Gpd5sbGI.w; path=/; expires=Tue, 28-May-24 12:08:45 GMT; domain=.openai.com; HttpOnly; Secure; SameSite=None
x-content-type-options: nosniff
server: cloudflare
cf-ray: 88adf5d23e41b1a0-WAW
alt-svc: h3=":443"; ma=86400
And here is the one which is ok:
curl -I 'https://openai.com/gptbot-ranges.txt' \
-H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7' \
-H 'accept-language: pl-PL,pl;q=0.9,en-US;q=0.8,en;q=0.7' \
-H 'cache-control: no-cache' \
-H 'pragma: no-cache' \
-H 'priority: u=0, i' \
-H 'sec-ch-ua: "Chromium";v="124", "Google Chrome";v="124", "Not-A.Brand";v="99"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'sec-ch-ua-platform: "macOS"' \
-H 'sec-fetch-dest: document' \
-H 'sec-fetch-mode: navigate' \
-H 'sec-fetch-site: none' \
-H 'sec-fetch-user: ?1' \
-H 'upgrade-insecure-requests: 1' \
-H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36'
HTTP/2 200
date: Tue, 28 May 2024 11:39:43 GMT
content-type: text/plain; charset=utf-8
content-length: 32
accept-ranges: bytes
access-control-allow-origin: *
age: 275360
cache-control: public, max-age=0, must-revalidate
content-disposition: inline; filename="gptbot-ranges.txt"
etag: "4635b1edcbc0e2f3239222285f16af75"
strict-transport-security: max-age=63072000
x-matched-path: /gptbot-ranges.txt
x-vercel-cache: HIT
x-vercel-id: arn1::fc92r-1716896383784-3f538073c369
cf-cache-status: DYNAMIC
set-cookie: __cf_bm=XRE5IaUEElGa.14hRahQZ3FuYcP.2mgrzdN83W_zlgk-1716896383-1.0.1.1-wj_RQubqeagMVngfNtMl6PouL4W5mN0iTSwyAYAhOyrXh5ou6puKYT9pZdeazCqf8iA6z60BADaEIaQ2QUCcMQ; path=/; expires=Tue, 28-May-24 12:09:43 GMT; domain=.openai.com; HttpOnly; Secure; SameSite=None
x-content-type-options: nosniff
set-cookie: _cfuvid=293tOfg1S8LGIM2TU87JoLNWGD.s97KV9xFNU6vOrq0-1716896383801-0.0.1.1-604800000; path=/; domain=.openai.com; HttpOnly; Secure; SameSite=None
server: cloudflare
cf-ray: 88adf73e6bb7b247-WAW
alt-svc: h3=":443"; ma=86400