OpenAI batch API gets stuck for hours with status `in_progress`

rajat2 · September 5, 2024, 9:53pm

Hello all,

Since past one week we are seeing serious issues with batch API calls. The API gets stuck in state in_progress for hours. This started happening since last 10 days and has happened 4 times in last days.
Manually cancelling the stuck batch jobs also does not help in getting the API to work. At the end we are left with no solution other than just to wait for API to become responsive.
Once API becomes responsive (usually after 16-24 hrs) it process the same request within a reasonable time. This has happened even with minimal load (meaning with the request of really small payload).
Looking at the previous posts , it seems this is a known issue but has anyone found any concrete solution for this issue. Since API getting stuck like this really makes the API unreliable for us.
We are using finetune gpt35-turbo model in this case. Looking forward to the suggestions and solutions here.

nicholishen · September 5, 2024, 10:06pm

Can you use 4o-mini instead? It’s even cheaper than 3.5

Also, the stated turnaround time is 24 hrs. Are you experiencing delays longer than that?

rajat2 · September 5, 2024, 10:10pm

Thanks @nicholishen .Is it guaranteed/expected that this issue will not occur with 4o-mini?
Max delay I have seen is close to 24 hrs, so far not longer than that.

Also it will be really helpful if you can share the link for stated turnaround time for this issue.

nicholishen · September 5, 2024, 10:14pm

You should budget 24 hrs for any batch job. My mini batches have come back pretty fast, but my batch jobs haven’t been that big either.

Foxalabs · September 5, 2024, 10:15pm

Batch mode can be anywhere form instant to 24 hours, the cost saving is afforded by using spare time on the compute clusters, when busy, it can be a longer than at times when there is more spare capacity.

Basically you should only use it for tasks where a 24 hour delay will not cause an issue.

rajat2 · September 5, 2024, 10:16pm

Got it @nicholishen . Are you using gpt 4o-mini for your batch jobs?

nicholishen · September 5, 2024, 10:21pm

mini is my go-to. I’m mostly using structured outputs now and when mini can’t cut I bump up to 4o, but mini gets the job done for most things.

rajat2 · September 5, 2024, 10:22pm

Got it, thanks for your suggestion here.

b00802884 · September 21, 2024, 6:12am

I also encountered this issue when using GPT-4o-mini.

ajithr007 · September 25, 2024, 8:16pm

We ran into the same issue from this morning. Batch API stuck in in_progress status for an extended period.

We’re using gpt-4o. Tried gpt-4o-mini as well but no luck. There are only few records in the batch. I’ll let the batch run and will check the status tomorrow.

platypus · September 25, 2024, 8:55pm

Hi @rajat2 @b00802884 @ajithr007 !

Batch API has been my go-to for many use cases, and I haven’t had too many issues with any of the supported models. I have had instances where finalizing stage (when the output file is being written) takes a long time, but it always completes within 24h.

With that being said, and as others have pointed out, it’s not actually guaranteed to complete within 24h - it uses spare compute capacity and you are only charged for the completed jobs (at 50% discount), which is also not counted towards your “standard model” token limits or rate limits - so it’s very cheap.

The best practice is just to ensure that you don’t have any time-dependent jobs in there, and that you add some extra handling to check for expired state and then retry the batch.

Even though max requests per .jsonl are 50 000, I still prefer to actually split it into sub-batches, and run those, but it sounds like you are all running small number of requests anyway.

Topic		Replies	Views
Batches don't work at all Bugs batch-api	18	1663	November 28, 2024
BatchAPI Jobs Failing/Expiring after 24 hours with no progress? API api , batch	5	1607	October 28, 2024
Gpt-4o-mini batch api still bugged? Bugs batch-api	8	239	May 7, 2025
Batching API response varied for the same task API api	7	170	May 9, 2025
The GPT-4o Batch API has been extremely slow API api	7	936	March 14, 2025

OpenAI batch API gets stuck for hours with status `in_progress`

Related topics