Rate Limit Error: Minute and Daily Limit

mayar.alzerki · January 4, 2024, 7:40am

Hello everyone,

I was using the gpt-4-1106-preview which has a context window 128,000 tokens, using the chat completion endpoint and everything was working fine. I started sending long prompts and I started facing the rate limit error in minutes saying that I have to wait for couple of seconds (knowing that my account is in Tier 1 which means the limit is 150,000 tokens), and tried to add a backoff but it did not work successfully.
After multiple tries, I started facing the issue of rate limit error for daily limit which is 500,000 (since I’m in Tier 1)

Please check the usage in the following pictures:

Apparently, I have not reached the daily limit, not the limit in one minute as I can see from the usage. And I still have money in my account.

I got lost would someone help me to overcome this issue and explain to me what happened?

Thank you!

Diet · January 4, 2024, 8:05am

Hi!

It looks like the rate token limits are enforced by the load balancer. It’s just conjecture but it’s possible that the daily limit gets checked and incremented before the minute limit, so that if you send a bunch of requests that get rejected by the minute limit you can still exhaust your daily limit

A lot of people are having problems with the rate limiting, and 500k daily is indeed pretty low, unfortunately.

mayar.alzerki · January 4, 2024, 9:24am

but even when I send like 75,000 tokens, it gets rejected by the minute limit! Do you have any idea why? and What is the best approach to overcome this issue?

I agree with you that it looks like the rejection is consuming the daily limit, but I cannot find a clear documentation for the rejection criteria, like what each rejection consumes?

Last question please, in order to move to tier 2, I have to spend 50$, is it monthly spending (600$/year)? and do I have to charge extra dollars in order to send requests and receive responses, or all my requests and responses will be consumed out of the 50$?

Appreciate your response!!

vb · January 4, 2024, 9:38am

You can also reference the cookbook:

Here we have a statement:

If you are constantly hitting the rate limit, then backing off, then hitting the rate limit again, then backing off again, it’s possible that a good fraction of your request budget will be ‘wasted’ on requests that need to be retried. This limits your processing throughput, given a fixed rate limit.

Which gives a hint that a failed request is still a request.

Regarding the next Tier it’s sufficient to pay the amount once. Then you should be clear to move to the next tier, provided that the time requirement is also fulfilled.

mayar.alzerki · January 4, 2024, 10:12am

Thank you for your response!

My only concern is why I’m having the error of hitting the limits, while actually I am not!

Like when I am sending a prompt with 86000 tokens (the limit per minute is 150,000) I get an error, and it is mentioned that the used tokens are approximately 69000.

Diet · January 4, 2024, 10:32am

This is also just conjecture: are you using Arabic script? The tokens are calculated differently by the rate limiter than by the model. It’s possible that the rate limiter is considerably overestimating the token count.

mayar.alzerki · January 4, 2024, 12:55pm

No I’m using English script, and I’m using the tiktoken in order to calculate number of tokens. Whenever I have text, I tokenize it, count number of tokens, if it is bigger than 100,000 tokens (since limit is 150,000), I divide the text into smaller chunks of 100,000 tokens per chunk.

Diet · January 4, 2024, 2:12pm

You may unfortunately need to try even smaller chunks until you get your tier upgrade

As I mentioned, the rate limiting thing doesn’t seem to use tiktoken

One thing, as a last resort, is to maybe consider using OpenAI on Azure. I don’t know if you still need to get approved and what the signup process is now, but it might be an option

mayar.alzerki · January 9, 2024, 6:55am

Thank you! I tried smaller chunks and it worked out. In my case, sending smaller chunks (10,000 tokens) gave better results than bigger chunks.

thanks for your suggesting regarding OpenAI on Azure, I will check it out!

Topic		Replies	Views
Rate limit reached for 10KTPM-200RPM API gpt-4 , gpt-35-turbo	10	5960	October 24, 2023
I don't know where where my tokens are being used. I think it is wrong API gpt-4 , api , gpt-4-turbo	12	1876	December 10, 2023
Help Needed with “Rate Limit Exceeded” on API (Tier 5, $1500 Budget Limit) API api , rate-limit , api-rate-limits	2	110	November 3, 2024
Hitting Rate Limit with small group of Users? API api-rate-increase	14	6116	January 20, 2024
Rate limit error Tier 2 Account Rate Limit Issues with gpt-3.5-turbo API gpt-35-turbo , api , rate-limit , api-billing , api-rate-limits	6	7020	January 2, 2024

Rate Limit Error: Minute and Daily Limit

Related topics