I was using the gpt-4-1106-preview which has a context window 128,000 tokens, using the chat completion endpoint and everything was working fine. I started sending long prompts and I started facing the rate limit error in minutes saying that I have to wait for couple of seconds (knowing that my account is in Tier 1 which means the limit is 150,000 tokens), and tried to add a backoff but it did not work successfully.
After multiple tries, I started facing the issue of rate limit error for daily limit which is 500,000 (since Iâm in Tier 1)
Please check the usage in the following pictures:
Apparently, I have not reached the daily limit, not the limit in one minute as I can see from the usage. And I still have money in my account.
I got lost would someone help me to overcome this issue and explain to me what happened?
It looks like the rate token limits are enforced by the load balancer. Itâs just conjecture but itâs possible that the daily limit gets checked and incremented before the minute limit, so that if you send a bunch of requests that get rejected by the minute limit you can still exhaust your daily limit
A lot of people are having problems with the rate limiting, and 500k daily is indeed pretty low, unfortunately.
but even when I send like 75,000 tokens, it gets rejected by the minute limit! Do you have any idea why? and What is the best approach to overcome this issue?
I agree with you that it looks like the rejection is consuming the daily limit, but I cannot find a clear documentation for the rejection criteria, like what each rejection consumes?
Last question please, in order to move to tier 2, I have to spend 50$, is it monthly spending (600$/year)? and do I have to charge extra dollars in order to send requests and receive responses, or all my requests and responses will be consumed out of the 50$?
If you are constantly hitting the rate limit, then backing off, then hitting the rate limit again, then backing off again, itâs possible that a good fraction of your request budget will be âwastedâ on requests that need to be retried. This limits your processing throughput, given a fixed rate limit.
Which gives a hint that a failed request is still a request.
Regarding the next Tier itâs sufficient to pay the amount once. Then you should be clear to move to the next tier, provided that the time requirement is also fulfilled.
My only concern is why Iâm having the error of hitting the limits, while actually I am not!
Like when I am sending a prompt with 86000 tokens (the limit per minute is 150,000) I get an error, and it is mentioned that the used tokens are approximately 69000.
This is also just conjecture: are you using Arabic script? The tokens are calculated differently by the rate limiter than by the model. Itâs possible that the rate limiter is considerably overestimating the token count.
No Iâm using English script, and Iâm using the tiktoken in order to calculate number of tokens. Whenever I have text, I tokenize it, count number of tokens, if it is bigger than 100,000 tokens (since limit is 150,000), I divide the text into smaller chunks of 100,000 tokens per chunk.
You may unfortunately need to try even smaller chunks until you get your tier upgrade
As I mentioned, the rate limiting thing doesnât seem to use tiktoken
One thing, as a last resort, is to maybe consider using OpenAI on Azure. I donât know if you still need to get approved and what the signup process is now, but it might be an option