Here is a simplified breakdown of the Batch API issue I’m trying to troubleshoot.
Core problem
A single request works, but a 15-request Batch stalls at in_progress 0/15.
What I have verified:
-
The same request body works synchronously in about 7 seconds.
-
A one-request Batch using the same JSONL line completes.
-
The 15-request Batch validates and moves to
in_progress, but stays at0/15for over an hour. -
There is no
error_file_id, no failed request count, and no validation error. -
The JSONL parses cleanly and has unique
custom_idvalues. -
The 15-request file is about 292 KB.
-
Each request line is about 15.8k–21.8k characters.
-
One measured direct request used about 4.3k prompt tokens and around 500–750 output tokens.
Current hypothesis
This does not look like a basic JSONL syntax problem.
It may be related to Batch-specific scheduling, enqueued prompt-token limits, request complexity, or how multi-request batches are handled for this model/endpoint combination.
The newer batch is also materially larger than the last working one, mainly because the system prompt and QA context grew.
Things I am unsure about
-
Can a multi-request Batch stay at
in_progress 0/Nbecause of Batch queueing or enqueued prompt-token limits, even when a one-request Batch works? -
Are
prompt_cache_keyandprompt_cache_retentionsafe to include in/v1/chat/completionsBatch request bodies, or should I remove them and let caching happen automatically? -
Can strict structured outputs (
response_format: json_schema,strict: true) contribute to hidden retries or delayed Batch processing, even when the same request works synchronously? -
Is there any way to get deeper diagnostics for a Batch that validates but never starts completing requests?
Tests I am considering
I want to change only one thing at a time:
-
Remove
prompt_cache_keyandprompt_cache_retentionfrom the JSONL and retry the same 15-request Batch. -
Split the same 15 requests into smaller Batch files, for example 2-request or 5-request chunks.
-
Reduce
max_completion_tokensfrom 1600 to something closer to observed output size, such as 900, mainly as an efficiency test. -
If Batch remains unreliable, process this small canary synchronously and keep Batch only for larger stable runs.
Does this debugging path make sense, or is there another Batch-specific diagnostic I should try first?