Incomplete API responses due to "max_output_tokens" limit during batch processing

I’m experiencing an issue when using the responses API where outputs are returned with "status": "incomplete" and "reason": "max_output_tokens", even though my max_output_tokens is explicitly set to 25000, which follows OpenAI’s recommendation.

Interestingly, this issue does not occur when using the completions endpoint, even when max_tokens is only 1024 .

:light_bulb: Batch body args:

  • Model: gpt-4o-mini-2024-07-18
  • Endpoint with issue: responses API (batch)
  • max_output_tokens: 25000
  • temperature: 0.5
  • Error status: "status": "incomplete", "reason": "max_output_tokens"
  • background: false (i.e., this is a blocking request)

Question:

Why is the responses API prematurely terminating even with such a high token limit?

2 Likes

Let’s investigate the model:

16k max output tokens.

The API calls should be failing on you.

A better message (like the rate limit and API validator sends back for a normal call) would be helpful as a batch return.

The batch API should not be running the calls at all; the endpoint should be returning an error.

BTW, there is no “set globally”. You have to construct individual complete API calls as JSON lines, each with their own parameters. I’ll assume it is just a miscommunication, and you are doing that.

I had been using the responses endpoint for my batch jobs, but this issue started occurring recently, likely after the GPT-5 release. I’ve now switched back to the completions endpoint, and it works fine without any errors.

In responses, failed requests show the reason clearly in the response. But in completions, even if something goes wrong, the request is marked as completed and no error is shown.

I am also receiving this error. I pay $200 per month and I can’t even get a single API to call. It always says max_output_tokens.

You can request a refund at support@openai.com

Try completions endpoint. It does not fail as much as responses