I was running a bunch of batch jobs the other day, starting them one at a time, and I encountered the error of reaching enqueued token limit of 90,000. I checked the list of batch jobs and they were either completed or failed due to token limit, nothing in progress.
So I decreased the content of the job to 1 task with less than 1500 tokens (calculated by the tokenizer) to guarantee it runs but I still got the token limit error. It wasn’t successful until I waited for almost a day to rerun the jobs one at a time, but now it become very inconsistent of returning the token limit error. Each of my tasks takes no more than 1500 tokens, sometime a job of 50 tasks would be successful, sometimes 1 task would still reach the limit. And these all happened when there is no in progress job.
This is the first time I encounter this issue, anyone knows the possible reasons and solutions to this? Any insights would be appreciated.
I’m pretty sure that this particular limit works on a daily reset, not by how quickly the jobs are performed and cleared to make queue space for more jobs.
This excerpt used to be at help.openai.com - but it has been wiped, likely to agree with the simple method actually used:
Once your batch request is completed, your batch rate limit is reset, as your input tokens are cleared. The limit depends on the number of global requests in the queue. If the Batch API queue processes your batches quickly, your batch rate limit is reset more quickly.
Similar remains in “understanding rate limits” but it is dubious based on reports like yours.
Your rate limit is by your usage tier, seen under “limits” in your account. This is based on how much you have paid OpenAI in the past and how much time has passed before it is recalculated upon a new payment.
Tier 1 pays the same per token to get a pittance of service (or pays more when denied the discount for anything over three minutes worth of API calls). You can’t even send one full-context chat completions call to gpt-4o to batch, and if you tried reducing that to 50k it would be instantly denied by the TPM anyway…
Thank you. I just tried running the same job and it is now running, so I do think the 90,000 token limit applies to the currently running batch jobs and reset when they are done. And this is the first time I have seen these issues: Failure in running small jobs, and super super slow speed in processing my tasks, do you think this has something to do with the rate limit?
What you have no control over is how fast batch processing submits the jobs. Ideally, one would think that it turns off any per-minute rate limits that you have otherwise, so that 80k of a 50 call batch can be executed in parallel, distributed, and returned as quickly as there is computational availability.
Individual items in the results thus should not start failing because of model rate limits, which is something between the outside world and the models - otherwise everybody’s TPM being lower than the batch queue could produce failures.
“Slow” is how it is supposed to operate, a 24hr service window. That it can sometimes turn around in 10 minutes gives false expectations for many. Getting the request bounced because you have hit a queue size rate limit is at least a status that comes back faster.
Excuse me resurrecting the thread but I have a problem very similar to the one described in the original post: I create batch requests, a few requests a day. It seems more and more common for me to run into the problem of batches failing with following errors: "
Enqueued token limit reached for gpt-4o in organization org-myorgid. Limit: 90,000 enqueued tokens. Please try again once some in_progress batches have been completed." The problem is, the message seems to not point to a valid cause. I always run only one batch at a time, and start one batch only after the previous one completed. I am pretty certain I have no enqueued tasks at any moment of submitting a new batch job. Yet I am getting above errors, recently quite frequently. Initially I thought that my requests could be simply too large. I did send too large requests from time to time, but I implemented a limit into my service and now my batch JSONLs are not larger than 75k tokens. I was even able to submit a job with exactly the same file which was previously considered too large, and it got processed successfully.
I am not even sure if I read docs correctly. The rate limits page suggests that the limit applies only to enqueued tokens, which should free up the quota after jobs get completed. The same thing is suggested by error returned by the API. However, it does not seem to be the case. The limits page, OTOH, lists the limit as 90k tokens per day, which would explain why I run into limits, but it does not seem to match my symptoms either: I am almost sure that I was able to run more jobs than 90k tokens in one day. I also do not know what “per day” would exactly mean, is it a calendar day from midnight to midnight (what would be the clock used for midnight?), or is it a 24h sliding window, or something else.
I think (again, I am not sure) that the issue of incorrectly failing batch jobs is getting worse after sending a job which is actually too large and fails for correct reason, but then makes the system choke and causes difficulties with accepting further request, even of correct size. But again, this observation is not yet confirmed. Another possibilities are that either the limits on my queue are recalculated/cleared with greater delay than just after completing a job, or the failure message is not correct and masks actual problem.
Would anyone have any idea or a hint why my batch jobs fail with “exceeded limit of enqueued tokens” even if my job is the only enqueued job with size smaller than the limit?