Rate limit issue, very confused with results

alexey.klimenko · March 31, 2023, 5:52pm

Hi All. Looking for some advice about token rate limit issue.

I have a fine-tuned ada model that I use to do some sentence inference (kind of hybrid classification/completion)
I have around 48M sentences I need to process. I have to use parallel processing to do it in a reasonable time.
I tried two approaches:

OpenAI’s cookbook parallel processing code batched with 20 prompts per request (each prompt is one review sentence of average 50 tokens). The api_request_parallel_processor.py, was quite slow, with only nine sentences per second. This will take forever to process our data volume.
Python futures. Also batched by 20
The processing speed is much higher (for 8 workers - up to 100 sentences/s), but we hit the rate limit error way too early with it. When rate limit is hit, the total processed token count (I counted tokens sent/returned) is less than 150,000. So I am confused as to why we hit this limit though our token counter is way below.
Rate limit reached for default-global-with-image-limits in organization org-YYYYYYYY on tokens per min. Limit: 250000 / min. Current: 245443 / min. Contact support@openai.com if you continue to have issues.

According to this doc (OpenAI API) the rate limits should be Pay-as-you-go users (after 48 hours) 350,000 TPM (also it says for ada it 1TPM is 200 tokens).

Sent email to OpenAI support, sent request for rate limit increase - no reply.

Would appreciate any advice. Thanks!

alexey.klimenko · March 31, 2023, 11:47pm

Here Rate Limit Advice | OpenAI Help Center
I read

Rate limits can be quantized, meaning they are enforced over shorter periods of time (e.g. 60,000 requests/minute may be enforced as 1,000 requests/second). Sending short bursts of requests or contexts (prompts+max_tokens) that are too long can lead to rate limit errors, even when you are technically below the rate limit per minute.

So it looks like whatever this quantization algorithm is, it triggers prematurely. This is very unfortunate as you cannot get full API performance even if you under your TPMs.

Also there is another question, I can’t seem to find a good answer to: why with fine tuned ada, my TPM is 250,000 and not 250,00*200 as documentation says?

curt.kennedy · March 31, 2023, 11:52pm

I have heard others on this forum say that the increased rate does NOT auto-kick-in, you have to contact them to get it adjusted up.

alexey.klimenko · April 1, 2023, 3:41am

Thanks for the reply, Curt.
… if only someone from OpenAI support replied to my emails or quota increases…

Topic		Replies	Views
Completion API rate limits API	2	1621	July 4, 2023
Hitting Rate Limit with small group of Users? API api-rate-increase	14	6474	January 20, 2024
Reaching 1mio token count limit with parallel api calls API token	0	435	April 18, 2024
Rate Limits for preview models? API gpt-4	11	5008	March 11, 2024
Got rate limit messages which hasn't gone away after few hours API api	3	1515	July 26, 2023

Rate limit issue, very confused with results

Related topics