I was running a bunch of batch jobs the other day, starting them one at a time, and I encountered the error of reaching enqueued token limit of 90,000. I checked the list of batch jobs and they were either completed or failed due to token limit, nothing in progress.
So I decreased the content of the job to 1 task with less than 1500 tokens (calculated by the tokenizer) to guarantee it runs but I still got the token limit error. It wasn’t successful until I waited for almost a day to rerun the jobs one at a time, but now it become very inconsistent of returning the token limit error. Each of my tasks takes no more than 1500 tokens, sometime a job of 50 tasks would be successful, sometimes 1 task would still reach the limit. And these all happened when there is no in progress job.
This is the first time I encounter this issue, anyone knows the possible reasons and solutions to this? Any insights would be appreciated.
I’m pretty sure that this particular limit works on a daily reset, not by how quickly the jobs are performed and cleared to make queue space for more jobs.
This excerpt used to be at help.openai.com - but it has been wiped, likely to agree with the simple method actually used:
Once your batch request is completed, your batch rate limit is reset, as your input tokens are cleared. The limit depends on the number of global requests in the queue. If the Batch API queue processes your batches quickly, your batch rate limit is reset more quickly.
Similar remains in “understanding rate limits” but it is dubious based on reports like yours.
Your rate limit is by your usage tier, seen under “limits” in your account. This is based on how much you have paid OpenAI in the past and how much time has passed before it is recalculated upon a new payment.
Tier 1 pays the same per token to get a pittance of service (or pays more when denied the discount for anything over three minutes worth of API calls). You can’t even send one full-context chat completions call to gpt-4o to batch, and if you tried reducing that to 50k it would be instantly denied by the TPM anyway…
Thank you. I just tried running the same job and it is now running, so I do think the 90,000 token limit applies to the currently running batch jobs and reset when they are done. And this is the first time I have seen these issues: Failure in running small jobs, and super super slow speed in processing my tasks, do you think this has something to do with the rate limit?
What you have no control over is how fast batch processing submits the jobs. Ideally, one would think that it turns off any per-minute rate limits that you have otherwise, so that 80k of a 50 call batch can be executed in parallel, distributed, and returned as quickly as there is computational availability.
Individual items in the results thus should not start failing because of model rate limits, which is something between the outside world and the models - otherwise everybody’s TPM being lower than the batch queue could produce failures.
“Slow” is how it is supposed to operate, a 24hr service window. That it can sometimes turn around in 10 minutes gives false expectations for many. Getting the request bounced because you have hit a queue size rate limit is at least a status that comes back faster.