Reproducable GPT4 Rate limit bug


We are seeing an incorrect response to GPT 4 Rate Limiting. We are getting rate limiting errors when we are nowhere close to hitting the rate limit.

When checking the headers in Postman we see the following errors:

Call resulting in 41 Tokens:

But the rate limit is decreased with 2021 tokens:

Also the remaining requests always show 199 but that of course does not bother us :slight_smile:


1 Like

Rate limit is not based on total tokens. It is based on “completion_tokens” and “max_tokens”. The “prompt_tokens” has 0% to do with it.

prompt_tokens and completion_tokens is total 41 tokens as per the example…

Read what I wrote again.

Here are the official docs:

Your rate limit is calculated as the maximum of max_tokens and the estimated number of tokens based on the character count of your request. Try to set the max_tokens value as close to your expected response size as possible.

You are using the wrong variables in your math.

1 Like

Or for the chat endpoint: don’t set or send the max_tokens parameter at all. Then you don’t get tokens you don’t use counted against the rate. All non-input context length can be used for generating a response.

1 Like

This answer should be pinned to the homepage with the largest font available. Thank you @_j for finally explaining to me why I am getting random 429 when I am nowhere near any usage limits.

In my case, I was receiving 429 for longer prompts, even though the max_tokens was the same for all requests.

I find the way the tokens are calculated towards the limit super confusing and even ChatGPT was not able to point me in the right direction!

1 Like