GPT-5.1 is the newest flagship model, part of the GPT-5 model family. Our most intelligent model yet, GPT-5.1 has similar training for:
Code generation, bug fixing, and refactoring
Instruction following
Long context and tool calling
GPT-5.1 and gpt-5.1-chat-latest will be available on all paid API tiers at the same price and rate limits as GPT-5. We’re also releasing gpt-5.1-codex and gpt-5.1-codex-mini, optimized for longer-running, agentic coding tasks in Codex-like environments.
Users can now use GPT‑5.1 without reasoning by setting reasoning_effort to ‘none’. This makes the model behave like a non-reasoning model for latency-sensitive use cases, with the high intelligence of GPT‑5.1 and added bonus of performant tool-calling.
Extended prompt caching (up to 24 hours) to reduce cost and latency for long-running interactions. To use extended caching with GPT‑5.1, add the parameterprompt_cache_retention='24h’on the Responses or Chat Completions API. See the prompt caching docs for more detail.
Please refer to our API docs for more information! A prompting guide has also been released to help you transition to this new model.
I went through the available docs and announcements for this release and didn’t find any mention of 5.1-mini or nano, besides the codex-mini version.
My expectation is that these variants will follow soon, possibly alongside the gpt-5.1-pro release. I would be surprised if they weren’t already in the pipeline.
Yes, I think that’s one of the improvements 5.1 brings to the table, and it’s such a small detail.
That leads to the next point: the amount of documentation for this release is enormous. I kept digging deeper into a rabbit hole of information covering every aspect and feature.
The inconsistencies on the model pages have already been raised with the team.
Does the model have the same fault in non-streaming as GPT-5: delivering no incomplete output?
For example now on GPT-5.1: Set max_completion_tokens to 60, there is non-delivery of any output, and billing as reasoning:
input tokens: 20385
output tokens: 60
uncached: 289
non-reasoning: 0
cached: 20096
reasoning: 60
The numeric value is different than gpt-5, in that on gpt-5, the impartial or minimal reasoning is quantized to 64 token units, apparently to obfuscate the amount of reasoning still done and/or call it 0 (vs seen at release). Yet the same symptom as gpt-5, in that you can not receive even 1000 tokens of final output when it is not complete. Any portion that would be user-seen never is sent and still billed as reasoning.
So
BUG: AI model output must be fixed for incomplete “final” generation, so you get everything you paid for.
(gpt-5-chat-latest performs correctly in giving partial outputs by true “not reasoning”.)