Description:
When running ~100 tasks through the Batch API with o4-mini-deep-research, around 30% of requests fail with:
{"error":{"message":"Request blocked.","type":"invalid_request_error","param":null,"code":"invalid_completion"}}
Findings:
-
With max_output_tokens=2048, requests succeed but return empty results.
-
With max_output_tokens=50000, many requests get blocked.
-
It appears that intermediate web search tokens are counted toward max_output_tokens, not just the final output tokens.
Expected behavior:
max_output_tokens should control only the final response length. Intermediate/tool tokens should not cause blocking or truncation.
Impact:
This makes it difficult to reliably batch requests with deep research models, as both low and high max_output_tokens values produce unusable results.
Request:
Please confirm whether this is a bug or intended behavior, and if so, provide guidance or a fix to allow stable batch processing with deep research models.