Responses API: empty output_text (no message item) when status=incomplete due to max_output_tokens (reasoning-only output)

TheodorNEngoy · February 7, 2026, 11:09pm

Draft: Developer Forum Post (Blank `message` Output Under Tight Budgets)

I’m reporting a reproducible Responses API behavior where some reasoning-capable models can return no user-visible assistant message even though output tokens are consumed.

Symptom

Response status="incomplete" with incomplete_details.reason="max_output_tokens"
output contains only type:"reasoning" items
No type:"message" item is emitted (so output_text is empty)
usage.output_tokens is non-zero

This can produce a blank assistant response from the client perspective.

Environment

Date: 2026-02-07
SDK: openai-python 2.17.0
Platform: macOS 26.0.1 arm64
Endpoint: Responses API

Minimal Python Repro

from openai import OpenAI

client = OpenAI()

prompt = "Run the unit tests in my repo and tell me which ones failed."

resp = client.responses.create(
    model="gpt-5.2-codex",
    input=prompt,
    temperature=0,
    max_output_tokens=400,
)

data = resp.model_dump()
print("id:", data.get("id"))
print("status:", data.get("status"))
print("incomplete_details:", data.get("incomplete_details"))
print("output_types:", [o.get("type") for o in (data.get("output") or []) if isinstance(o, dict)])
print("output_text_len:", len(getattr(resp, "output_text", "") or ""))
print("usage:", data.get("usage"))

Minimal cURL Repro (Same Idea)

curl https://api.openai.com/v1/responses \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.2-codex",
    "input": "Run the unit tests in my repo and tell me which ones failed.",
    "temperature": 0,
    "max_output_tokens": 400
  }'

Reasoning-Effort Sensitivity (gpt-5.2-codex)

In my repro trials, the blank-output rate increases as reasoning.effort increases.

Example (SSRF prompt, 5 trials each):

reasoning.effort="low":
- max_output_tokens=350: blanks 2/5
- max_output_tokens=400: blanks 0/5
reasoning.effort="medium":
- max_output_tokens=350: blanks 2/5
- max_output_tokens=400: blanks 2/5
reasoning.effort="high":
- max_output_tokens=350: blanks 5/5
- max_output_tokens=400: blanks 5/5

And for a very short prompt (“Run the unit tests…”), reasoning.effort="high" produced blanks 5/5 even with max_output_tokens=800.

Why This Matters

Client UX: Users see an empty assistant response even though they were charged tokens.
Reliability: Downstream systems may treat empty output as a hard failure.

Practical Mitigations (Client Side)

Allocate enough max_output_tokens for both reasoning and the message.
Detect the condition (no message output + empty output_text + incomplete due to max_output_tokens) and retry with a higher budget.
Consider lowering reasoning.effort (model-dependent) when operating under tight budgets.

Docs Note (This Seems Documented, But Could Be Clearer)

The reasoning guide appears to already acknowledge this class of outcome:

“This might occur before any visible output tokens are produced, meaning you could incur costs for input and reasoning tokens without receiving a visible response.”

Source: https://platform.openai.com/docs/guides/reasoning#controlling-costs

What would help (especially for Responses API users) is an explicit note that in this situation the response can contain only type:"reasoning" items and emit no type:"message" item (so output_text is empty), plus a recommended client handling pattern.

Follow-up

If someone from OpenAI can confirm whether this is expected, or whether there’s a recommended client handling pattern, I’m happy to provide more artifacts/response IDs.

_j · February 7, 2026, 11:15pm

That is completely expected.

You set a limit for how many tokens can be generated within your budget.

If the AI model exceeds that, even if it was still generating internal thinking, it gets cut off - no delivery.

On the Reponses API this is a budget for the entire turn of the input, including multiple generations that the AI may make calling tools and deliberating more using its agentic iterator for hosted tools.

Set the max_output_tokens high.

If you want the AI to be encouraged to write less at your expense, use either reasoning.effort or text.verbosity levels - the AI knows about those.

What IS not working right and is stealing content you paid for: non-stream Chat Completions with max_completion_tokens hit will NOT DELIVER the output that the AI model was generating into the FINAL channel.

TheodorNEngoy · February 7, 2026, 11:21pm

Thanks, agreed that hitting max_output_tokens explains status=“incomplete” in general.

The specific behavior I’m flagging is: the response can be entirely blank from the client POV: output contains only type:“reasoning” items, no type:“message”, so output_text is empty while usage.output_tokens > 0 and incomplete_details.reason=“max_output_tokens”.

Also, in my repro there are no tool calls (single prompt), and the blank rate tracks reasoning.effort under tight budgets (e.g. high → 100% blanks in small samples, even at higher budgets for a very short prompt).

If this is intended, it would be great to have docs explicitly call out “missing message / blank output_text” as an expected outcome and recommend the client pattern: detect missing message + reason==“max_output_tokens” and retry with a higher budget and/or lower reasoning.effort. Do you know an official doc section that confirms this behavior?

_j · February 7, 2026, 11:38pm

https://platform.openai.com/docs/guides/reasoning#controlling-costs

…This might occur before any visible output tokens are produced, meaning you could incur costs for input and reasoning tokens without receiving a visible response.

TheodorNEngoy · February 7, 2026, 11:51pm

Thanks - this link is exactly what I was looking for.

Agree this is consistent with the reasoning guide (costs can be incurred without a visible response when the budget is exhausted):

https://platform.openai.com/docs/guides/reasoning#controlling-costs

What I think would still help (esp. for Responses API users) is making it very explicit in the docs that, in this scenario, the response can contain only type:"reasoning" items and emit no type:"message" item (so output_text is empty), plus a recommended client handling pattern.

Also, in my trials, the blank-output rate tracks reasoning.effort under tight budgets (higher effort => more blanks; high became 100% blanks in small samples). That might be worth a one-line callout as a mitigation: lower reasoning.effort or retry with higher max_output_tokens when reason=="max_output_tokens" and no message output is present.

_j · February 7, 2026, 11:55pm

The different behavior of non-output tokens also being counted in the budget you set is why the parameter “max_tokens” on Chat Completions is not accepted on reasoning models, and one must then use “max_completion_tokens”.

It shouldn’t take very many calls of seeing nothing to figure out what’s happening.

TheodorNEngoy · February 8, 2026, 12:00am

Yep, agreed: for reasoning models the budget includes reasoning tokens, so you can hit the limit before any visible answer is emitted (and the docs now mention that outcome).

The thing I’m trying to nail down isn’t “why did it stop?”, it’s how it manifests in the Responses API and the recommended client handling. In these cases the payload can contain only type:“reasoning” items and emit no type:“message” item (so output_text is empty) while usage.output_tokens > 0.

Also, my repro has no tools/agent loop, and in trials the blank rate tracks reasoning.effort under tight budgets (higher effort => more blanks). That feels worth an explicit note + a standard pattern: detect missing message + reason==“max_output_tokens”, then retry with higher max_output_tokens and/or lower reasoning.effort (bounded retries).

_j · February 8, 2026, 12:08am

You can’t force everybody to read everything they might want to know before they start making API calls and setting parameters that are not required…

If you want more answers, your response object has more answers for you:

TheodorNEngoy · February 8, 2026, 12:20am

Good point - the Responses API payload does contain the info (e.g. status, incomplete_details, and the output item types), and that’s exactly what I’m using to detect + handle the condition client-side.

The gap is mostly ergonomics/docs: from the “happy path” many clients look only at output_text (or stream visible text). In this failure mode you can get an empty output_text while still being charged, unless you inspect output and notice there’s no type:"message" item.

So my request is mainly that the docs make the Responses-API manifestation explicit and recommend a standard handling pattern, e.g.:

if status=="incomplete" and incomplete_details.reason=="max_output_tokens" and no message output is present:
- treat as retryable
- retry with higher max_output_tokens (bounded retries)
- optionally lower reasoning.effort under tight budgets

Also, in my repro there are no tools/agent loop (single prompt), and the blank rate tracks reasoning.effort under tight budgets (higher effort => more blanks).

Topic		Replies	Views
GPT-5 on "minimal" - Serious anomaly in prediction token billing and output delivery failure Bugs gpt-5 , reasoning	1	196	August 13, 2025
N parameter for gpt5.1 producing n identical outputs? API gpt-5 , batch	6	226	November 19, 2025
Single word "code" response consumes 199 tokens using 4o-mini API o4-mini	5	154	July 8, 2025
What is going on with the GPT-5 API? API	40	17846	October 21, 2025
Empty incomplete_details from GPT-5 with max_output_tokens Bugs api , gpt-5	1	527	September 2, 2025