What is new field in Rate limits (x-ratelimit-reset-tokens_usage_based)

Hi there!
I provide headers with 4 requests (first 3 requests, and 1 last request)

Where I found one more new field: x-ratelimit-reset-tokens_usage_based
And where reset-tokens have a strange behavior

Request 1: (2023-12-03 19:19:12)

"x-ratelimit-limit-tokens": "1500000",
"x-ratelimit-limit-tokens_usage_based": "1500000",
"x-ratelimit-remaining-requests": "499",
"x-ratelimit-remaining-tokens": "1495621",
"x-ratelimit-remaining-tokens_usage_based": "1495621",
"x-ratelimit-reset-requests": "120ms",
"x-ratelimit-reset-tokens": "4m12.172s",
"x-ratelimit-reset-tokens_usage_based": "4m12.172s",

Request 2: (2023-12-03 19:19:14)

"x-ratelimit-limit-requests": "500",
"x-ratelimit-limit-tokens": "1500000",
"x-ratelimit-limit-tokens_usage_based": "1500000",
"x-ratelimit-remaining-requests": "499",
"x-ratelimit-remaining-tokens": "1482750",
"x-ratelimit-remaining-tokens_usage_based": "1490773",
"x-ratelimit-reset-requests": "120ms",
"x-ratelimit-reset-tokens": "16m33.562s",
"x-ratelimit-reset-tokens_usage_based": "8m51.438s",

Request 3: (2023-12-03 19:19:17)

"x-ratelimit-limit-tokens_usage_based": "1500000",
"x-ratelimit-remaining-requests": "498",
"x-ratelimit-remaining-tokens": "1478342",
"x-ratelimit-remaining-tokens_usage_based": "1486365",
"x-ratelimit-reset-requests": "176ms",
"x-ratelimit-reset-tokens": "20m47.458s",
"x-ratelimit-reset-tokens_usage_based": "13m5.333s",

And

Request 4 (2023-12-03 19:41:59)

"x-ratelimit-limit-tokens": "1500000",
"x-ratelimit-limit-tokens_usage_based": "1500000",
"x-ratelimit-remaining-requests": "496",
"x-ratelimit-remaining-tokens": "43244",
"x-ratelimit-remaining-tokens_usage_based": "1212010",
"x-ratelimit-reset-requests": "464ms",
"x-ratelimit-reset-tokens": "23h18m29.144s", 
"x-ratelimit-reset-tokens_usage_based": "4h36m28.223s",

My account is tier 2.
gpt-4-1106-preview 5,000 RPM, 300,000 TPM, 1,500,000 TPD

Seems like a bug, or I can’t find a pattern in which the rate limit reset increases.
And what is x-ratelimit-remaining-tokens_usage_based (no any information in https://platform.openai.com/docs/guides/rate-limits/rate-limits-in-headers)

The reset-tokens time is how long it will take for the AI to completely return to an original state with your original full count available.

It shows that your 100 out of 100000 isn’t kept on record for 24 hours until finally expiring, another formula is used.


You make intriguing revelation of the divergence of old and new value after multiple calls.

It seems the “usage_based” is the number charting a lower impact to your limit, and one wonders if that is a new method, or the old method moved there…


More rate methods may have been inspired by previous discussion:

This just maps to your 1500000 tokens per day. You burned nearly 300k tokens, and so this is equivalent to 4.5 hours of token burn.

Another formula idea is to imagine that your account is being continuously refilled with tokens at your rate limit. This then allows individual calls to only be calculated at that time.

One can see if either of these returns matches that concept. Also worth pondering if one of the methods can be exploited, thus needing revision.

1 Like

This just maps to your 1500000 tokens per day. You burned nearly 300k tokens, and so this is equivalent to 4.5 hours of token burn.

But it seems I burned 1.5kk but not 300k.

Initially x-ratelimit-limit-tokens === x-ratelimit-limit-tokens_usage_based and it was 1.5kk.
Where logically x-ratelimit-limit-tokens should be 300k (Tokens Per Minute).

And reset-tokens can’t be more than 1 min
But it was: 4m12.172s, 16m33.562s, 20m47.458s, 23h18m29.144s

P.S. After a few hours, I don’t have one more ‘300k tokens’.

Another formula idea is to imagine that your account is being continuously refilled with tokens at your rate limit. This then allows individual calls to only be calculated at that time

If it is will be true, then it is a problem. How to work, and how to understand when I can make a request? :smiley:

1 Like

Rate limits are not something that I approach regularly :laughing:

The most likely case to encounter them, apart from those betas with low rates, or trying to resell GPT-4, is in embeddings. One might want to send 5,000,000 tokens a minute to get files added to a vector search database, for example.

If looking at making parallel batch calls, you can just frame your own requests by the minute. Count tokens and hold off until the next minute when near the rate limits. If you do encounter the error, consider it as a manual “pause” signal: handle the first rate limit error as a signal to turn off your requests temporarily, add to the time before you request again, and learn a lower threshold.

If interactive sessions, you’ll have to report the need to wait to the user, and if it’s not just you, a queue system would be fair.

There’s also a new “request exception” form at the bottom of limits if you have a unique case where OpenAI agrees you don’t simply need to pay more for a higher tier for a particular model.

Yeah if this hasn’t healed up yet, then there is a glitch. Meaning they are tracking 300k per day, not 300k per minute, for the 300,000 TPM metric.