Reaching 1mio token count limit with parallel api calls

dimitrios · April 18, 2024, 9:13pm

Hey there, we have an issue with the token rate limit (TPM) and a hard time to understand the reason.

We are on Tier 4 with 1mio TPM on gpt 3.5 and processing an array of 100+ items for translation. When we loop through the array with parallel api-calls, we quickly reach a reported token consumption which is 10x higher then the real token consumption. Visible in the remainingTokens header response.

Total tokens used across all items: ~27000

In this screenshots (from retool workflows) we ran the process sequentially (top) and in parallel (bottom):

Sequential:
The remainingRequests & remainingTokens reported are somehow wrong. They don’t reflect the actual consumption. Regardless the api calls we do, it says it still has 9999 call remaining. One reason seems to be the reset which happens after some ms. But should’t my real consumption of requests & tokens be reflected here? Naturally, we would expect that the numbers decrease until it reaches the total request/tokens consumed, or not? reference here

In parallel:
Here, we see the extreme opposite, the calculation predicts the remainingTokens dropped below 840798 tokens and a few api calls later below 829932 tokens, while the real token consumption is around 27000. Thats 140000+ tokens wrong.

The system could request 20x100 api-calls in parallel, which would be below our requestLimit (of 10.000), however, as soon as we execute, we instantly reach the token rate limit. (20x27000 tokens = 540.000 tokens)

Rate limit reached for gpt-3.5-turbo in organization org- on tokens per min (TPM): Limit 1000000, Used 999247, Requested 3824. Please try again in 184ms.

The Question:
In this article: here it states:

Rate limits can be quantized, meaning they are enforced over shorter periods of time (e.g. 60,000 requests/minute may be enforced as 1,000 requests/second). Sending short bursts of requests or contexts (prompts+max_tokens) that are too long can lead to rate limit errors, even when you are technically below the rate limit per minute.

(reference also here: Is there any problem if i exceed TPM limit? - #4 by N2U)

Does this apply also to the tokens? Maybe this would explain the burst in reported token consumption.

Best

Topic		Replies	Views
Rate limit issue, very confused with results API	4	2736	December 22, 2023
TPM Rate limited exceeded - why? API rate-limit	2	150	December 31, 2024
Rate Limits for preview models? API gpt-4	11	4455	March 11, 2024
Context length vs. Token Limits API gpt-4 , api	1	2018	July 11, 2024
Token per minute rate limit for GPT4 issues API rate-limit	7	10690	December 22, 2023

Reaching 1mio token count limit with parallel api calls

Related topics