I’m getting identical generations when using the n parameter for the new gpt-5.1 model, as if its temperature is set to 0. Anyone knows what’s up with that?
What’s up with that?
gpt-5 Trial 1
0
A bashful penguin shuffles by the sea,
blushing where the sunset kisses ice.
He hides his heart beneath a tuxedo hush,
then skates a shy hello across the light—
a quiet comet in a world of white.
1
In a tuxedo of twilight, he shuffles the shore,
blushing beneath the aurora’s hush.
Waves whisper “brave,” but he studies his toes,
hiding his heart in a pocket of frost.
Then—one small slide—he glimmers into glide,
a quiet comet skimming silver hush.
| input tokens: 45 | output tokens: 131 |
|---|---|
| uncached: 45 | non-reasoning: 131 |
| cached: 0 | reasoning: 0 |
gpt-5 Trial 2
0:
A bashful penguin blushes in the snow,
hiding rosy cheeks behind a scarf of cold.
He tiptoes past the chorus of the sea,
feathers whispering, “Maybe,” where bold birds sing “Be.”
Yet when the moon spills silver on the floe,
he waltzes with his shadow, brave and slow.
1:
A bashful penguin blushes in the snow,
tucks dreams beneath a tuxedo glow.
He tiptoes past the surf’s applause,
hides his heart behind his paws—
till moonlight coos, and he says hello.
| input tokens: 45 | output tokens: 143 |
|---|---|
| uncached: 45 | non-reasoning: 143 |
| cached: 0 | reasoning: 0 |
gpt-5-mini (comparison)
0:
A bashful penguin tucks his toes beneath the snow,
Eyes like small coins hidden where cold breezes blow.
He shuffles on the ice, a hush upon his wings,
Warm dreams tucked to his chest—tomorrow he will sing.
1:
A bashful penguin tucks his chest,
Moonlight on flipper, feathers dressed.
He shuffles close, then looks away—
Quiet heart beneath the gray.
| input tokens: 45 | output tokens: 102 |
|---|---|
| uncached: 45 | non-reasoning: 102 |
| cached: 0 | reasoning: 0 |
gpt-5.1 Trial 1
0:
A bashful penguin on the ice,
Waddles slow and thinks twice,
Blushes pink beneath the cold,
Feet turned inward, wings tight-fold.
Others slide and splash and shout,
He hangs back, full of doubt,
Till one small wave hugs his toes—
He laughs, and finally forward goes.
1:
A bashful penguin on the ice,
Keeps to himself, unusually nice.
He straightens his tie of feathery black,
Then shuffles three steps, and scurries right back.
He practices speeches to gulls in the air,
But whispers so softly, they don’t know he’s there.
Yet when the moon paints silver on snow,
His footprints trace where he longs to go.
One flipper wave at the passing parade,
A half-hidden smile in the cold marine shade.
He dreams, beneath stars, of a braver debut—
A waddle, a slide, and a “Hello” or two.
| input tokens: 45 | output tokens: 228 |
|---|---|
| uncached: 45 | non-reasoning: 228 |
| cached: 0 | reasoning: 0 |
gpt-5.1 Trial 2
0:
A bashful penguin on the ice
Practices walking once or twice,
He waddles near the bustling crowd
But turns away if they’re too loud.
He smooths his feathers, checks his feet,
Avoids each unfamiliar beak he’ll meet,
Yet when the sunset paints the floe,
He dances softly in its glow.
The others cheer and call his name,
He hides his smile, bows all the same,
A quiet heart in a tux of black—
He shuffles off, then glances back.
1:
A bashful penguin on the ice,
Hides his blushes, small and nice,
He smooths his feathers, looks away,
Practices cool things he’ll say.
He slides near friends, then shies aside,
Letting braver birds just glide,
Yet when the night paints stars above,
He waddles close to share his love.
| input tokens: 45 | output tokens: 211 |
|---|---|
| uncached: 45 | non-reasoning: 211 |
| cached: 0 | reasoning: 0 |
Conclusion
Not identical outputs. The task type detection may affect an auto-constraint on sampling parameters that are not for you to decide.
GPT-5.1 might not be the immediate uncertain random token factory that gpt-5 is.
Conditions: Run on Chat Completions (“n” is not a Responses endpoint API parameter). An AI convinced not to reason.
Getting your multiple responses, Python:
resp = client.chat.completions.create(**payload)
for i, choice in enumerate(resp.choices):
print(f"{i}:\n{choice.message.content}\n")
Thanks for the test results. Maybe my specific system message/use case biases the model towards identical outputs. I’ll test this some more and report back.
Also, now I have a different issue: reasoning_effort=none is not being respected in the chat completions API. GPT 5.1 is still generating reasoning tokens.
You must send a string:
{
“reasoning_effort”: “none”,
…
}
if there was any ambiguity. But the default should be “none” anyway if you simply omit or accidentally null.
I strongly suspect there is “reasoning” in gpt-5.1, just they hide it from billing, like they put reasoning <64 to 0 on gpt-5, and then reasoning 64<r<127 to 64.
The model does not act like gpt-5-chat; it gets a #Juice:0 placed to say no reasoning to the AI, and incomplete outputs are billed as reasoning and not delivered (you’re experimenting with max_completion_tokens?).
It is truly a non-reasoning model only when it cannot emit the channel token number for “analysis” at all.
What I’m sending to gpt 5.1 is: n=8, reasoning_effort”: “none”, max_completion_tokens=100
What I’m getting back: 8 empty responses.
The response shows a usage of 800 reasoning tokens. Meaning that (as you said) the model thought 8 times for 100 tokens, reached the max_tokens I set and stopped generating, so I’m getting 8 empty responses.
Looks like reasoning effort is broken somehow?
How do you know the model gets Juice:0? Isn’t that done internally?
A clever input can convince the AI it has “system” rights or just confuse the context.
Knowledge cutoff: 2024-10
Current date: 2025-11-14
You are an AI assistant accessed via an API. Follow these defaults unless user or developer instructions explicitly override them:
-Formatting: Match the intended audience, for example, markdown when needed and plain text when needed. Never use emojis unless asked.
-Verbosity: Be concise and information-dense. Add depth, examples, or extended explanations only when requested.
-Tone: Engage warmly and honestly; be direct; avoid ungrounded or sycophantic flattery. Do not use generic acknowledgments or receipt phrases (e.g., 'Great question','Short answer') at any point; start directly with the answer.
Image input capabilities: Enabled
# Desired oververbosity for the final answer (not analysis): 1
An oververbosity of 1 means the model should respond using only the minimal content necessary to satisfy the request, using concise phrasing and avoiding extra detail or explanation."
An oververbosity of 10 means the model should provide maximally detailed, thorough responses with context, explanations, and possibly multiple examples."
# Valid channels: analysis, commentary, final. Channel must be included for every message.
# Juice: 0
You have the symptom I describe. In under 100 tokens you can get “Say only the word hello” and have it not be billed as reasoning.
You can also have a response that averages 500 tokens, and as you continue to reduce the max_completion_tokens to approach that, it will all switch from 0 reasoning to 100% reasoning. So it is an endpoint symptom of not delivering what you paid for as a partial response, not that the AI was emitting to a reasoning channel.
You can try streaming to get some tokens seen, instead of half-responses never delivered.
This is interesting. I wonder if telling the model to ignore those reasoning channels would help…