The most advanced frontier model for professional work and long-running agents.
GPT-5.2 brings stronger performance on complex, multi-step tasks. It is better at building spreadsheets and presentations, writing code, interpreting images, and working with long contexts.
Agentic coding takes a major step forward, making GPT-5.2 the leading model in its price range and the new default for tools like Windsurf.
The Thinking variant is more reliable, with about 30 percent fewer factual errors. It hallucinates less, which makes it more dependable for research and analysis.
Long-context reasoning reaches a new high. GPT-5.2 nearly solves the 4-needle MRCR benchmark and clearly outperforms GPT-5.1 when analyzing very long documents.
There is no API facility for informing the AI what you want to receive nor any events for other than image tool use. So it cannot output images.
While it was indicated that the gpt-5.1 model can generate images in one AMA response … so can gpt-4o. That modality is not exposed except by a special trained model gpt-image-1 completely wrapped in tools and image endpoints and safety.
I have a question about GPT-5.2 Pro on the official website. The UI only shows two reasoning/thinking options (e.g., “Thinking time: Standard” and “Extended”). Which API thinking_effort levels do these correspond to—medium + high, or high + xhigh?
Turn off adblock and your own blocks that refuse a bunch of feature gates and tracking. Then hard-refresh. Results in the clear “Pro” designation seen in the ChatGPT message input area as reported.
You also might get a terrible non-working modal popup for thinking when the model starts creating a response, blocking the entire UI.
API reasoning effort values supported for gpt-5.2-pro, with internal map, are ‘medium:64’, ‘high:256’, and ‘xhigh:768’. Values which are not affected by the use of pro/non-pro.
Not supported is service_tier: “flex”.
What it corresponds to in ChatGPT is not worth asking (“cough” 512/768), as OpenAI can make ChatGPT and test-time compute and the model itself dynamic, to respond to computation needs.
I cannot say for certain, as the available options differ across my devices and browsers. This could be due to A/B testing, the rollout itself, or a browser-related issue, as mentioned above.
If I were to test this, I would likely send a few requests via the API using different settings and compare the thinking time with ChatGPT. That said, I cannot make a definitive statement at this point.