Batch API gets stuck `in_progress` how to debugg?

Here is a simplified breakdown of the Batch API issue I’m trying to troubleshoot.

Core problem

A single request works, but a 15-request Batch stalls at in_progress 0/15.

What I have verified:

  • The same request body works synchronously in about 7 seconds.

  • A one-request Batch using the same JSONL line completes.

  • The 15-request Batch validates and moves to in_progress, but stays at 0/15 for over an hour.

  • There is no error_file_id, no failed request count, and no validation error.

  • The JSONL parses cleanly and has unique custom_id values.

  • The 15-request file is about 292 KB.

  • Each request line is about 15.8k–21.8k characters.

  • One measured direct request used about 4.3k prompt tokens and around 500–750 output tokens.

Current hypothesis

This does not look like a basic JSONL syntax problem.

It may be related to Batch-specific scheduling, enqueued prompt-token limits, request complexity, or how multi-request batches are handled for this model/endpoint combination.

The newer batch is also materially larger than the last working one, mainly because the system prompt and QA context grew.

Things I am unsure about

  1. Can a multi-request Batch stay at in_progress 0/N because of Batch queueing or enqueued prompt-token limits, even when a one-request Batch works?

  2. Are prompt_cache_key and prompt_cache_retention safe to include in /v1/chat/completions Batch request bodies, or should I remove them and let caching happen automatically?

  3. Can strict structured outputs (response_format: json_schema, strict: true) contribute to hidden retries or delayed Batch processing, even when the same request works synchronously?

  4. Is there any way to get deeper diagnostics for a Batch that validates but never starts completing requests?

Tests I am considering

I want to change only one thing at a time:

  1. Remove prompt_cache_key and prompt_cache_retention from the JSONL and retry the same 15-request Batch.

  2. Split the same 15 requests into smaller Batch files, for example 2-request or 5-request chunks.

  3. Reduce max_completion_tokens from 1600 to something closer to observed output size, such as 900, mainly as an efficiency test.

  4. If Batch remains unreliable, process this small canary synchronously and keep Batch only for larger stable runs.

Does this debugging path make sense, or is there another Batch-specific diagnostic I should try first?

1 Like