Reproducable GPT4 Rate limit bug

raoul · August 28, 2023, 12:29pm

Hello,

We are seeing an incorrect response to GPT 4 Rate Limiting. We are getting rate limiting errors when we are nowhere close to hitting the rate limit.

When checking the headers in Postman we see the following errors:

Call resulting in 41 Tokens:

But the rate limit is decreased with 2021 tokens:

Also the remaining requests always show 199 but that of course does not bother us

Thanks,

mezimm · August 28, 2023, 1:39pm

Rate limit is not based on total tokens. It is based on “completion_tokens” and “max_tokens”. The “prompt_tokens” has 0% to do with it.

raoul · August 28, 2023, 1:51pm

prompt_tokens and completion_tokens is total 41 tokens as per the example…

mezimm · August 28, 2023, 2:12pm

Read what I wrote again.

Here are the official docs: https://platform.openai.com/docs/guides/rate-limits/reduce-the-max_tokens-to-match-the-size-of-your-completions

Your rate limit is calculated as the maximum of max_tokens and the estimated number of tokens based on the character count of your request. Try to set the max_tokens value as close to your expected response size as possible.

You are using the wrong variables in your math.

_j · August 31, 2023, 11:01am

Or for the chat endpoint: don’t set or send the max_tokens parameter at all. Then you don’t get tokens you don’t use counted against the rate. All non-input context length can be used for generating a response.

lukasnevosad · November 8, 2023, 5:58pm

This answer should be pinned to the homepage with the largest font available. Thank you @_j for finally explaining to me why I am getting random 429 when I am nowhere near any usage limits.

In my case, I was receiving 429 for longer prompts, even though the max_tokens was the same for all requests.

I find the way the tokens are calculated towards the limit super confusing and even ChatGPT was not able to point me in the right direction!

Topic		Replies	Views
Gpt4 token usage not using more than 3000 tokens even though it’s listed at much higher availability API	12	2055	December 17, 2023
The ChatCompletion response limit headers do not reflect previous request or token usage Bugs bug , api	1	1264	January 21, 2024
Dreaded 429 rate limit errors when our usage is well-under the limits API gpt-4 , gpt-35-turbo , chatgpt , account-problem , api	3	3350	June 3, 2023
Token per minute rate limit for GPT4 issues API rate-limit	7	11265	December 22, 2023
Rate limit reached for 10KTPM-200RPM API gpt-4 , gpt-35-turbo	10	6370	October 24, 2023

Reproducable GPT4 Rate limit bug

Related topics