Rate limit issue, very confused with results

Hi All. Looking for some advice about token rate limit issue.

I have a fine-tuned ada model that I use to do some sentence inference (kind of hybrid classification/completion)
I have around 48M sentences I need to process. I have to use parallel processing to do it in a reasonable time.
I tried two approaches:

  1. OpenAI’s cookbook parallel processing code batched with 20 prompts per request (each prompt is one review sentence of average 50 tokens). The api_request_parallel_processor.py, was quite slow, with only nine sentences per second. This will take forever to process our data volume.

  2. Python futures. Also batched by 20
    The processing speed is much higher (for 8 workers - up to 100 sentences/s), but we hit the rate limit error way too early with it. When rate limit is hit, the total processed token count (I counted tokens sent/returned) is less than 150,000. So I am confused as to why we hit this limit though our token counter is way below.
    Rate limit reached for default-global-with-image-limits in organization org-YYYYYYYY on tokens per min. Limit: 250000 / min. Current: 245443 / min. Contact support@openai.com if you continue to have issues.

According to this doc (OpenAI API) the rate limits should be Pay-as-you-go users (after 48 hours) 350,000 TPM (also it says for ada it 1TPM is 200 tokens).

Sent email to OpenAI support, sent request for rate limit increase - no reply.

Would appreciate any advice. Thanks!

Here Rate Limit Advice | OpenAI Help Center
I read

Rate limits can be quantized, meaning they are enforced over shorter periods of time (e.g. 60,000 requests/minute may be enforced as 1,000 requests/second). Sending short bursts of requests or contexts (prompts+max_tokens) that are too long can lead to rate limit errors, even when you are technically below the rate limit per minute.

So it looks like whatever this quantization algorithm is, it triggers prematurely. This is very unfortunate as you cannot get full API performance even if you under your TPMs.

Also there is another question, I can’t seem to find a good answer to: why with fine tuned ada, my TPM is 250,000 and not 250,00*200 as documentation says?

I have heard others on this forum say that the increased rate does NOT auto-kick-in, you have to contact them to get it adjusted up.

Thanks for the reply, Curt.
… if only someone from OpenAI support replied to my emails or quota increases…