OpenAI’s TPD (Tokens Per Day) billing cycle is based on a natural day, correct? Which time zone is it aligned with? Does the limit reset immediately after the cycle ends?
The rate limit is actually a bit intriguing, an unpublished formula. It doesn’t reset at a particular time or 24 hours after you make a call.
For example, we take gpt-4-vision with a 100-per-day limit. One call will have reduced your available count to 99. However, after 50 minutes, that count is reset to 100. Make two calls, and it is 2+ hours and then again you are reset to 100.
The daily limits are rather new, and are mostly for those models under preview or beta. The normal rates that one encounters are requests and tokens per minute.
Even someone on the forum that was blocked for exceeding the minute limit too many times.
Those per-minute limits also are a bit coy. The input tokens are estimated when you make a call, and that’s what they can be refused on. It is possible to make parallel API calls where the resulting output text will vastly exceed your per-minute rate, and then you are blocked for many minutes until that rate catches up.
So until OpenAI publishes their formula, and you can account for the delay of it being the responses that count against you, it’s a bit hard to program for. You have to estimate, add some buffer, and also catch the error if it does occur and retry later.
It seems like based on UTC. I checked this morning on 1st day of December and the Usage page is still November 30. Then it changed after some time.
It’s actually a “floating day” or “floating minute”, and you have to examine your headers to see when the resets are, in particular look at ‘x-ratelimit-reset-requests’ (or tokens) to see when you are full again.
‘openai-model’: ‘gpt-4-1106-vision-preview’,
‘openai-processing-ms’: ‘4584’,
‘openai-version’: ‘2020-10-01’,
‘x-ratelimit-limit-requests’: ‘10000’,
‘x-ratelimit-limit-tokens’: ‘300000’,
‘x-ratelimit-remaining-requests’: ‘9999’,
‘x-ratelimit-remaining-tokens’: ‘299345’,
‘x-ratelimit-reset-requests’: ‘8.64s’,
‘x-ratelimit-reset-tokens’: ‘131ms’
The algorithm appears to be simple: Both tokens and requests come out, and are replenished at your limit, usually a per-day limit, or per-minute limit. So each item replenishes at a rate of 86400/ThingRate per second. (for the per-day) or 60/ThingRate per second for the per-minute case.
So in my example above, you see 8.64s for the replenishment of 1 request. Here my request rate per day is 10000, so 1*86400/10000 = 8.64 seconds, which is what is shown on the header.
Also above, the tokens are at a per-minute rate, and I am 655 tokens short of my max (I used 655 tokens in the request). So I recover at:
655*60/300000 = 0.131 sec which is 131 msec, which is also shown in the header.
If you make a bunch of requests at the same time, I am guessing they take your current attributes recovered, and then add the times for each of the individual events, to give the new overall recovery time.
So it’s a pretty simple algorithm, and has no UTC epoch day, or anything like that associated with it.
I was pleasantly surprised when I discovered this, because I was worried about hitting some mystery “reset cliff” at a specific time, but no such thing here.
Our billing cycle is based on UTC time. I will make a note in our rate limits page: https://platform.openai.com/docs/guides/rate-limits/how-do-these-rate-limits-work to mention the days are calculated based on UTC.
Note, we have no TPD limits, we only have a RPD (requests per day) limit.
Your response doesn’t quite characterize the rate limits themselves, though.
One does not hit a rate limit and then have to wait until 00:00 UTC to be able to make another call.
Nor is it as straightforward as ChatGPT Plus, where you must wait for that 1st message to expire from the three hour window before you can send number 41.
After a reset as I describe in my earlier reply, the API is memoryless of what you did. 5 calls now, and 100 more available in a few hours.
It can be described as “you can operate continuously at a maximum rate that approaches your daily rate limit”, a description equally confusing.
Billing is indeed based on UTC time, where one could see the next day roll over when expanding by hour and then minute, adjusted to the local time. Now it is completely opaque, the answer about billing only informing us which bin in the bar chart calls are placed in.
It could be based on UTC times internally, but the headers are showing that it is a rate-limiting system, so the API quotas are independent of time zone, because they are based on recovery rates.
For example, say you have 100 RPD, and you run 99 of them at 23:59 UTC. You do NOT get to run another 99 or 100 a minute later at 00:00 UTC, you have to wait until 23:59 UTC the NEXT DAY, based on the headers.
If I am wrong on this, then I should probably run all my remaining quota at 23:50 or so each night. But I don’t do this because, based on the headers, I would lose out the entire next day of processing.
If so, great, even better! Just taking the conservative approach here
But sure, billing (month to month cutoffs) are locked to UTC, that makes the most sense for a worldwide system.