Looking for help understanding what we’re seeing on the Batch API with o3-deep-research-2025-06-26.
What we submitted
-
One day of legitimate state-research work across 7 different state slugs
-
Each state: 4 separate POST /v1/batches, each batch containing exactly 1 line targeting /v1/responses
-
Total legitimate request lines submitted across the whole day: 28 (one slug had only 3, so 27)
-
Our server logs the route invocation each time and confirms 28 distinct submissions, no client-side retries
What we observed in the OpenAI dashboard
For the most recently-affected slug alone, the Logs → Responses view shows 629 entries today, all with the JSONL body our 4 batches for that slug submitted (same prompt text, same max_output_tokens: 100000, same
max_tool_calls: 30, same tool config, background: false, metadata: {}). About 80+ of them are full completions with returned content (the rest are from 429s after we hit the per-model TPD cap).
So 4 batch lines became ~80 billed Deep Research completions, plus several hundred additional /v1/responses log entries that didn’t return content. The same pattern played out earlier in the day across the other 6 state
slugs.
Today’s spend on the model: $47.12 — consistent with dozens of completed Deep Research runs, not 28.
What we have ruled out from our side
- We have exactly two /v1/responses submission paths in our code: a direct background: true call and the batch JSONL above. Every one of the mystery resp_… objects has background: false, so the direct path is not the
source.
-
GET /v1/batches?limit=100 returns 27 batches for the day total. No extra batches on the affected slug. No older Wyoming batches. So fan-out is not “duplicate submissions we forgot about”.
-
Server-side logs show exactly one route invocation per slug for the day. No client-side retry loop.
-
Webhooks and crons in the codebase are read-only against OpenAI.
Other oddities
-
Each batch’s request_counts reads {total: 1, completed: 0, failed: 0}, despite the dashboard showing many /v1/responses executions associated with that batch’s JSONL body. So the counters don’t reflect what actually ran.
-
output_file_id and error_file_id are both null on all the batches in question, so the work isn’t surfacing through normal batch output channels even though we’re being billed for it.
-
4 batches for one slug have been stuck in cancelling for 11+ hours after POST /v1/batches/{id}/cancel returned HTTP 200. They never transitioned to cancelled. The other 23 batches show status completed despite having no
output file.
- The spawned resp_… objects carry no link back to a parent batch — metadata: {} on every one of them — so there’s no way from a response in the logs to figure out which batch produced it.
Questions for the community / OpenAI staff
1. Under what circumstances does a single batch line with total: 1 produce many /v1/responses executions? Is there an internal retry / fan-out policy on the batch worker?
2. Why do batch request_counts not reflect the actual number of /v1/responses executions performed for that batch?
3. Is it expected that a completed batch with no output_file_id still ran billable executions whose outputs were not delivered through the batch API?
4. Should per-line /v1/responses objects inherit the batch envelope’s metadata so they’re attributable? Right now the response object stands alone with no parent reference, which makes incidents like this very hard to
diagnose.