GPT-4 API - Confusing Ratelimit Headers

Hi there, I’ve been playing around with the GPT-4 API and am confused by the rate-limiting headers.

Here are the header responses from three sequential requests, all with about 20~ seconds of loading time.

x-ratelimit-limit-requests: 200
x-ratelimit-remaining-requests: 419
x-ratelimit-reset-requests: 125.82

Then on the next request:

x-ratelimit-limit-requests: 200
x-ratelimit-remaining-requests: 199
x-ratelimit-reset-requests: 59.70

Then again, on the request after

x-ratelimit-limit-requests: 200
x-ratelimit-remaining-requests: 362
x-ratelimit-reset-requests: 108.64

My questions are:

  1. Why is it showing an almost random number for the x-ratelimit-remaining-requests I’d imagine it would decrement by 1 on every request.
  2. How can my x-ratelimit-limit-requests be higher than my x-ratelimit-remaining-requests?
  3. What does x-ratelimit-reset-requests even mean? Is that the amount of seconds/minutes until the ratelimit resets? Again, why does that value feel random? How can it possible be fluxuating like that.

Not sure if it’s a bug, but it’s very strange and confusing.

Thanks for accepting me to the API!


I’m running into rate limits with GPT-4 as well, with this strange header. But in my case it’s 300ms.

It’s weird that I am also getting a 429 when my x-ratelimit-remaining-requests is 199 (out of 200). Is there burst rate limiting?

1 Like

I’ve discovered why I’m getting the 429 errors. Started logging the response body and I get a JSON object like this:

{"error":{"message":"The server had an error while processing your request. Sorry about that!","type":"server_error","param":null,"code":null}}

It sounds like this should be a 500, not a 429.