O4-mini returns empty response because reasoning token used all the completion token

lifan.xu · September 19, 2025, 4:21pm

Hi, this is Lifan from Aissist. I’ve noticed that when using O4-mini, there’s a small but recurring issue where the response is empty and the finish_reason is length.

In the example below, I set the max completion tokens to 3072. However, the model used all 3072 tokens as reasoning tokens, leaving none for actual content generation. I initially had the limit set to 2048 and observed the same issue, so I increased it to 3072—but it’s still happening. I was setting the reasoning effort to low, and sometimes retry the same request can solve the issue, but not always.

Does anyone know why this is occurring, or if there’s a way to prevent all tokens from being consumed purely for reasoning?

ChatCompletion(id=‘chatcmpl-CHXjJdaUN3ahZBpet3wPedM7ZtSRe’, choices=[Choice(finish_reason=‘length’, index=0, logprobs=None, message=ChatCompletionMessage(content=‘’, refusal=None, role=‘assistant’, audio=None, function_call=None, tool_calls=None, annotations=), content_filter_results={})], created=1758297269, model=‘o4-mini-2025-04-16’, object=‘chat.completion’, service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=3072, prompt_tokens=10766, total_tokens=13838, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=3072, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)), prompt_filter_results=[{‘prompt_index’: 0, ‘content_filter_results’: {‘hate’: {‘filtered’: False, ‘severity’: ‘safe’}, ‘self_harm’: {‘filtered’: False, ‘severity’: ‘safe’}, ‘sexual’: {‘filtered’: False, ‘severity’: ‘safe’}, ‘violence’: {‘filtered’: False, ‘severity’: ‘safe’}}}])

_j · September 19, 2025, 5:50pm

The API’s max_tokens parameter was renamed to max_completion_tokens for just this reason.

It is a signal that reflects that you are now not specifying the maximum length of the response that you want to receive, but rather, the maximum budget you want to pay for output tokens, which includes both the seen output and also the internal reasoning billed as output.

If you want to reduce the thinking a bit, use the “reasoning_effort” parameter on Chat Completions. This communicates with the AI "don’t think so hard before responding.

Better is to just set this high if you are going to send it at all, where it only would prevent runaway token generation - such as 30000.

lifan.xu · September 19, 2025, 6:13pm

Hi, we are using ChatComplete, and we are using max_completion_tokens here are the parameters we used for the request:
‘escalation_strategy’: ‘neutral’, ‘temperature’: 0.6, ‘top_p’: 0.3, ‘frequency_penalty’: 0.6, ‘presence_penalty’: 0.6, ‘max_completion_tokens’: 3072, ‘reasoning_effort’: ‘low’

Topic		Replies	Views
O1-preview empty response content API	4	550	February 4, 2025
Inconsistent Token Limits with “o3-mini-2025-01-31” Model—Empty Response Despite Supposed Large Context? API api , limitations , system-limitation	2	1572	March 4, 2025
O1-mini model output '' with finish_reason of length Bugs o1-preview	1	570	November 12, 2024
Single word "code" response consumes 199 tokens using 4o-mini API o4-mini	5	125	July 8, 2025
Discrepancy in Token Counts Between tiktoken and API Usage for o4-mini/gpt-4o-mini Bugs api	1	306	May 28, 2025

O4-mini returns empty response because reasoning token used all the completion token

Related topics