Draft: Developer Forum Post (Blank message Output Under Tight Budgets)
I’m reporting a reproducible Responses API behavior where some reasoning-capable models can return no user-visible assistant message even though output tokens are consumed.
Symptom
- Response
status="incomplete"withincomplete_details.reason="max_output_tokens" outputcontains onlytype:"reasoning"items- No
type:"message"item is emitted (sooutput_textis empty) usage.output_tokensis non-zero
This can produce a blank assistant response from the client perspective.
Environment
- Date: 2026-02-07
- SDK: openai-python 2.17.0
- Platform: macOS 26.0.1 arm64
- Endpoint: Responses API
Minimal Python Repro
from openai import OpenAI
client = OpenAI()
prompt = "Run the unit tests in my repo and tell me which ones failed."
resp = client.responses.create(
model="gpt-5.2-codex",
input=prompt,
temperature=0,
max_output_tokens=400,
)
data = resp.model_dump()
print("id:", data.get("id"))
print("status:", data.get("status"))
print("incomplete_details:", data.get("incomplete_details"))
print("output_types:", [o.get("type") for o in (data.get("output") or []) if isinstance(o, dict)])
print("output_text_len:", len(getattr(resp, "output_text", "") or ""))
print("usage:", data.get("usage"))
Minimal cURL Repro (Same Idea)
curl https://api.openai.com/v1/responses \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.2-codex",
"input": "Run the unit tests in my repo and tell me which ones failed.",
"temperature": 0,
"max_output_tokens": 400
}'
Reasoning-Effort Sensitivity (gpt-5.2-codex)
In my repro trials, the blank-output rate increases as reasoning.effort increases.
Example (SSRF prompt, 5 trials each):
reasoning.effort="low":- max_output_tokens=350: blanks 2/5
- max_output_tokens=400: blanks 0/5
reasoning.effort="medium":- max_output_tokens=350: blanks 2/5
- max_output_tokens=400: blanks 2/5
reasoning.effort="high":- max_output_tokens=350: blanks 5/5
- max_output_tokens=400: blanks 5/5
And for a very short prompt (“Run the unit tests…”), reasoning.effort="high" produced blanks 5/5 even with max_output_tokens=800.
Why This Matters
- Client UX: Users see an empty assistant response even though they were charged tokens.
- Reliability: Downstream systems may treat empty output as a hard failure.
Practical Mitigations (Client Side)
- Allocate enough
max_output_tokensfor both reasoning and the message. - Detect the condition (no
messageoutput + emptyoutput_text+ incomplete due to max_output_tokens) and retry with a higher budget. - Consider lowering
reasoning.effort(model-dependent) when operating under tight budgets.
Docs Note (This Seems Documented, But Could Be Clearer)
The reasoning guide appears to already acknowledge this class of outcome:
“This might occur before any visible output tokens are produced, meaning you could incur costs for input and reasoning tokens without receiving a visible response.”
Source: https://platform.openai.com/docs/guides/reasoning#controlling-costs
What would help (especially for Responses API users) is an explicit note that in this situation the response can contain only type:"reasoning" items and emit no type:"message" item (so output_text is empty), plus a recommended client handling pattern.
Follow-up
If someone from OpenAI can confirm whether this is expected, or whether there’s a recommended client handling pattern, I’m happy to provide more artifacts/response IDs.
