I’m really glad this thread exists because I’m in the same boat about 5.1 getting retired, and I wanted to share a concrete way I’ve been comparing 5.1 Thinking and 5.2.
I’ve been running a lot of informal A/B tests between the two models. My process is pretty simple and repeatable:
-
I open one chat with GPT-5.1 Thinking and one with GPT-5.2 Thinking.
-
I give both chats the exact same prompt, attachments, and context, copied and pasted. For example, “Help me answer this discussion board question in my own voice,” or “Rewrite this email to sound more natural and human.”
-
After both models reply, I copy their outputs into a new message and label them “Response Version 1” (which is the response 5.1 gave) and “Response Version 2” (which is the response 5.2 gave) without saying which model wrote which.
-
I then ask each model something like: “Here are two responses to the same prompt. Which one is stronger and why?”
-
I repeat this across different tasks: scripts, essays, discussion board replies, explanations, product reviews, etc.
What’s wild is that in roughly 90 percent of these comparisons, both models pick 5.1’s answer as better. Even 5.2 consistently prefers the 5.1 output when it doesn’t know which one it wrote.
The reasons it gives usually line up with what people in this thread are already saying:
- 5.1 sounds more natural and less mechanical
- It follows nuanced style instructions more closely
- It organizes ideas better and adds useful detail without rambling
- It feels more like a human collaborator instead of a template generator
So from my perspective, it isn’t just a “vibe” thing. When the newer model is repeatedly judging blind and still saying “the other response is better,” that feels like a pretty clear signal that 5.1 is still the stronger model for a lot of real world creative and writing tasks.
I understand from the support reply that retirement decisions happen at a broader platform level and can’t be reversed just because a few threads ask for it. But I really hope the team takes this kind of side by side evidence seriously. I attached a screenshot that show 5.2 explicitly choosing 5.1’s answer and explaining why. I can provide many more screenshots of this happening as well as the actual results from both models so you can see the clear degradation in quality from 5.1 to 5.2 if needed.
At minimum, it would help a lot if 5.1 could stay available as a legacy or “creative” option for Plus users instead of being removed entirely. For many of us who use ChatGPT mainly for writing, research synthesis, and long running projects, 5.1 isn’t interchangeable with 5.2 at all. And when the newer model itself keeps saying the older one is doing a better job, it’s hard to understand why that older one has to disappear instead of staying as another tool in the toolbox.
