Hi All. Looking for some advice about token rate limit issue.
I have a fine-tuned ada model that I use to do some sentence inference (kind of hybrid classification/completion)
I have around 48M sentences I need to process. I have to use parallel processing to do it in a reasonable time.
I tried two approaches:
OpenAI’s cookbook parallel processing code batched with 20 prompts per request (each prompt is one review sentence of average 50 tokens). The api_request_parallel_processor.py, was quite slow, with only nine sentences per second. This will take forever to process our data volume.
Python futures. Also batched by 20
The processing speed is much higher (for 8 workers - up to 100 sentences/s), but we hit the rate limit error way too early with it. When rate limit is hit, the total processed token count (I counted tokens sent/returned) is less than 150,000. So I am confused as to why we hit this limit though our token counter is way below.
Rate limit reached for default-global-with-image-limits in organization org-YYYYYYYY on tokens per min. Limit: 250000 / min. Current: 245443 / min. Contact email@example.com if you continue to have issues.
According to this doc (OpenAI API) the rate limits should be Pay-as-you-go users (after 48 hours) 350,000 TPM (also it says for ada it 1TPM is 200 tokens).
Sent email to OpenAI support, sent request for rate limit increase - no reply.
Would appreciate any advice. Thanks!