We are seeing an incorrect response to GPT 4 Rate Limiting. We are getting rate limiting errors when we are nowhere close to hitting the rate limit.
When checking the headers in Postman we see the following errors:
Call resulting in 41 Tokens:
But the rate limit is decreased with 2021 tokens:
Also the remaining requests always show 199 but that of course does not bother us
Rate limit is not based on total tokens. It is based on “completion_tokens” and “max_tokens”. The “prompt_tokens” has 0% to do with it.
prompt_tokens and completion_tokens is total 41 tokens as per the example…
Read what I wrote again.
Here are the official docs: https://platform.openai.com/docs/guides/rate-limits/reduce-the-max_tokens-to-match-the-size-of-your-completions
Your rate limit is calculated as the maximum of
max_tokens and the estimated number of tokens based on the character count of your request. Try to set the
max_tokens value as close to your expected response size as possible.
You are using the wrong variables in your math.
Or for the chat endpoint: don’t set or send the max_tokens parameter at all. Then you don’t get tokens you don’t use counted against the rate. All non-input context length can be used for generating a response.
This answer should be pinned to the homepage with the largest font available. Thank you @_j for finally explaining to me why I am getting random 429 when I am nowhere near any usage limits.
In my case, I was receiving 429 for longer prompts, even though the
max_tokens was the same for all requests.
I find the way the tokens are calculated towards the limit super confusing and even ChatGPT was not able to point me in the right direction!