Hi All. Looking for some advice about token rate limit issue.
I have a fine-tuned ada model that I use to do some sentence inference (kind of hybrid classification/completion)
I have around 48M sentences I need to process. I have to use parallel processing to do it in a reasonable time.
I tried two approaches:
-
OpenAI’s cookbook parallel processing code batched with 20 prompts per request (each prompt is one review sentence of average 50 tokens). The api_request_parallel_processor.py, was quite slow, with only nine sentences per second. This will take forever to process our data volume.
-
Python futures. Also batched by 20
The processing speed is much higher (for 8 workers - up to 100 sentences/s), but we hit the rate limit error way too early with it. When rate limit is hit, the total processed token count (I counted tokens sent/returned) is less than 150,000. So I am confused as to why we hit this limit though our token counter is way below.
Rate limit reached for default-global-with-image-limits in organization org-YYYYYYYY on tokens per min. Limit: 250000 / min. Current: 245443 / min. Contact support@openai.com if you continue to have issues.
According to this doc (OpenAI API) the rate limits should be Pay-as-you-go users (after 48 hours) 350,000 TPM (also it says for ada it 1TPM is 200 tokens).
Sent email to OpenAI support, sent request for rate limit increase - no reply.
Would appreciate any advice. Thanks!