I’m experiencing an issue when using the responses API where outputs are returned with "status": "incomplete" and "reason": "max_output_tokens", even though my max_output_tokens is explicitly set to 25000, which follows OpenAI’s recommendation.
Interestingly, this issue does not occur when using the completions endpoint, even when max_tokens is only 1024 .
A better message (like the rate limit and API validator sends back for a normal call) would be helpful as a batch return.
The batch API should not be running the calls at all; the endpoint should be returning an error.
BTW, there is no “set globally”. You have to construct individual complete API calls as JSON lines, each with their own parameters. I’ll assume it is just a miscommunication, and you are doing that.
I had been using the responses endpoint for my batch jobs, but this issue started occurring recently, likely after the GPT-5 release. I’ve now switched back to the completions endpoint, and it works fine without any errors.
In responses, failed requests show the reason clearly in the response. But in completions, even if something goes wrong, the request is marked as completed and no error is shown.