Gpt-5.4 ignores reasoning_effort="none" when max_completion_tokens is used

Found a bug with the gpt-5.4 endpoint.

If you pass reasoning_effort: “none”, it works as expected (0 reasoning tokens). But if you also pass max_completion_tokens in the same payload, the API completely ignores the reasoning flag, defaults back to reasoning, burns through your entire max_completion_tokens budget with invisible tokens, and spits out an empty string with finish_reason: “length”.

Repro:

const OpenAI = require("openai");
const openai = new OpenAI();

async function testBug() {
  console.log("Testing with max_completion_tokens...");
  
  // FAILS: Ignores 'none', generates 100 reasoning tokens, and returns ""
  const buggyResponse = await openai.chat.completions.create({
    model: "gpt-5.4-2026-03-05",
    messages: [{ role: "user", content: "Explain the theory of relativity." }],
    reasoning_effort: "none",
    max_completion_tokens: 100 // <-- This breaks it
  });
  console.log("Buggy Reasoning Tokens:", buggyResponse.usage.completion_tokens_details.reasoning_tokens); 
  // Outputs: 100

  console.log("\nTesting WITHOUT max_completion_tokens...");

  // WORKS: 0 reasoning tokens, generates normal text
  const workingResponse = await openai.chat.completions.create({
    model: "gpt-5.4-2026-03-05",
    messages: [{ role: "user", content: "Explain the theory of relativity." }],
    reasoning_effort: "none" 
    // max_completion_tokens omitted entirely
  });
  console.log("Working Reasoning Tokens:", workingResponse.usage.completion_tokens_details.reasoning_tokens); 
  // Outputs: 0
}

testBug();

The only workaround right now is dropping max_completion_tokens entirely, which obviously isn’t ideal since we lose a hard cap on token costs.

1 Like

What about max_output_tokens rather than completion, does that work?

1 Like

Hi, no, it doesn’t work. max_output_tokens returns “BadRequestError 400 Unknown parameter: ‘max_output_tokens’.”

If you meant the old max_tokens parameter, it seems to be no longer supported with this model, as it returns:

BadRequestError 400 Unsupported parameter: ‘max_tokens’ is not supported with this model. Use ‘max_completion_tokens’ instead.

Hi!

Thanks for the clear repro and raising this!

I can confirm:
chat.completions behaves correctly with reasoning_effort: "none" on its own, but when adding max_completion_tokens, it then ignores none, uses the whole budget on reasoning tokens, and returns an empty string with finish_reason: "length".

One possible solution is to switch this call to the Responses API and use:

response = client.responses.create(
    model="gpt-5.4",
    input="Explain cats to dogs.",
    reasoning={"effort": "none"},
    max_output_tokens=100,
)

That returns normal text with reasoning_tokens: 0.

1 Like

Thanks for confirming. I looked into the Responses API, but unfortunately it doesn’t support n > 1, which my app needs for generating multiple options at once.

For now, I’m just dropping max_completion_tokens entirely from my chat completions call so it doesn’t break. It’s a bummer losing the hard cap on costs though, so hopefully the team patches the main endpoint soon.

1 Like

The fault seen here is that Chat Completions does not deliver the output if it is not complete. It is all classified as “reasoning” in usage, even if it is clear the output would have transitioned to the final seen output.

Then, that there actually is reasoning at “none”, just hidden behind a threshold of 128 tokens, before which where it is not billed.

I’ve made posts about this before. Let’s say the AI will write 500 tokens quite predictably. With max_completion_tokens at 300, 400, 500, you get all “reasoning” and never a non-stream output. Increase that more, eventually you get the switch to the full output instead of no output, getting what you paid for only when the AI has reached the stop sequence and the output is done.

This symptom on Chat Completions has continued on reasoning models, with no sign that my reporting of this issue has had any impact. You pay, to then not get the partial output.

Hey everyone, Thank you for flagging this behaviour. While the workaround would be to use response api, we have also flagged this to our engineering team and they are looking into this for you. Thank you!

3 Likes