API rate limit hit but it was not actually hit


I am getting this error:

Rate limit reached for gpt-4-turbo-preview in organization on tokens per day (TPD): Limit 1500000, Used 1498667, Requested 3281.

Today is february 1st. My usage of today + yestarday’s don’t add more than 850,000 tokens.



According to my tier (tier 2) my limit is 1,500,000 TPD:

What is wrong here?

You cannot compare the billing information, which has a cutoff of 00:00 UTC to separate usage into different days, to that of the rate limiter, which has a rolling adaptive formula for usage that is not separated into distinct days. The accounting can also have delays in adding calls to billings.

Also, all models in a model class count against the rate limit, while the usage page is only displaying one model in the screenshot.

The daily limit for the preview model is quite low, even though they no longer state “not for production use”.

Let’s say you upgrade to tier-3, which is 5 million tokens per day. That’s under 50 full 100k+ context API queries per day. (However, at $1 per 100k input, that also would mean spending $50 per day.)

Tier-4: $250 paid and 14+ days since first successful payment – completely lifts daily limits of this model. So for large scale use, large scale prepayment is necessary…

Hopefully OpenAI will see that limiting those that have prepaid up to $99 to under $15 per day (in tokens), while others who have paid in similar or smaller amounts for months can spend $15 in 15 minutes, is a rather arbitrary limit.

Thanks for replying.

I understand that the rolling adaptive formula is different, but as you can see in the images, not even with a monthly rolling formula it would surpass 1,500,000 tokens. I mean, I just used 565,035 in the whole January (which ended yesterday) and 768,387 in February (today).

The only explanation would be that there is a delay in adding calls to billings. But I doubt that, because I’ve seen billing updating pretty fast after I make a call.

That’s the only gpt4 turbo model I’ve used.

Agree, hopefully they can increase those limits soon.

You can make a series of calls, with streaming of responses disabled, and then get token usage statistics in the response.

You also can get the headers of a response (with_raw_response method in python), and obtain all rate limit statistics, including the present state of daily limits.

With logging, you would be able to compare that the usage is being accounted for correctly in the remaining tokens by OpenAI. You also then can investigate more how past usage expires from your pool.

Your own rate limit techniques, informed by both tracking what you sent and received, along with monitoring the headers, can prevent you from approaching the rate to where the API denies you.

It seems that I was just upgraded today to tier 2 and somehow that generated that error. But API is now working properly.

Thanks for the help!