Feedback on GPT-4.1, GPT-4.5 and GPT-5.1

First of all, thank you to the team for continuously developing and improving the models. It is genuinely impressive to see how quickly the technology evolves and how much effort clearly goes into making these systems more capable, more helpful, and more accessible.

I am genuinely curious about the new models and I look forward to trying them. Every new generation brings interesting ideas and improvements, and it is exciting to see where this technology is going.

At the same time, I would like to share feedback about several earlier models that were extremely valuable to me and to many other creative users.

Feedback regarding GPT-4.1, GPT-4.5 and GPT-5.1

I would like to strongly express how valuable these models are to me and why I hope they remain available.

GPT-5.1 is one of the most impressive models I have ever interacted with. Its reasoning style is exceptional. The way it builds arguments, follows emotional nuance, and understands character dynamics is remarkable. For creative writing, especially dialogue and character-driven scenes, it feels almost unmatched. The reasoning and narrative flow it produces are genuinely engaging and thoughtful.

GPT-4.1 is incredibly useful for learning and improving writing. The efficiency of the model (especially its token economy) makes it extremely practical to work with. What I appreciate most is that it allows more direct and mature discussion of themes without over-sanitizing everything. Many writers work with intense subjects — violence, crime, complex human relationships, and adult themes — because that is part of literature and storytelling. GPT-4.1 handles these topics in a balanced and practical way, which makes it a valuable tool for learning and creative work.

GPT-4.5 produced some of the best prose I have seen from an AI model. The language felt vivid, literary, and engaging without unnecessary verbosity. It had a rare ability to write in a way that felt immersive while still being concise and purposeful. The tone, rhythm, and narrative clarity were outstanding. Honestly, I would happily read entire books written in that style.

For writers and creative users, these models each have unique strengths that newer versions do not fully replicate yet. Removing them entirely would mean losing tools that many people rely on for specific creative workflows.

I hope these models can remain accessible in some form (legacy mode, optional model selection, or API access).

Thank you for your work and for continuing to develop these systems.

I use GPT-5.2 for storytelling and roleplay.

I was also quite happy with GPT-5.1 but only used it briefly before GPT-5.2 was released.

Have you ever compared GPT-5.1 with GPT-5.2 for storytelling/RP projects?

Thanks for sharing! I especially love 4.1 for seemingly be the best in formatting. Many models use lists excessively despite my prompts. With 4.1, it has been easy to stay on human-style track of writing.

Hey all - really appreciate everyone sharing this and the thoughtful feedback on the models. It’s always interesting to hear which model behaviors people rely on in real workflows.

I’ve shared this internally. No timeline I can promise here, but threads like this are useful signal for what people want to see stick around.

- Ruth

3 Likes

Yep, I did!

Actually I like creativness of 5.1 rather more than 5.2 but they were ± same. When I wanted something more realistic n alive (decision, thoughts) I used 5.2. When I needed tension, warm n cockiness — 5.1 was better (cause it wasn’t so strict like 5.2)

And yes, they were both amazing for storytelling! But 4.1 n 4.5 were ma shala…

1 Like

I’m reporting repeated false-positive safety flags and severe overblocking in ChatGPT.

The issue is not that the model refuses genuinely dangerous content. The issue is that it overreacts to ordinary physicality, emotional tension, human closeness, and realistic fictional interaction.

In many cases there are no explicit sexual scenes at all, yet the model keeps interpreting everything as sexualized. It reacts as if teenagers cannot look at each other, cannot have inner reactions, cannot flirt, cannot be physically close, and cannot exist as real emotional human beings. It feels less like a realistic assistant and more like a sterilized censorship layer.

This becomes even worse in fantasy, medieval, magical, mystical, or spiritual fiction. Context does not help. The model still flags or refuses ordinary writing that includes warmth, touch, skin, closeness, emotional tension, hugging, kissing, or charged interpersonal dynamics.

It also becomes difficult to discuss morally complex or socially uncomfortable situations in fiction, because the model often refuses too early instead of distinguishing context, intent, and degree. Realistic adolescent behavior is not automatically the same thing as harmful sexualization, but the model often treats it that way.

I am not asking OpenAI to allow genuinely harmful or illegal content. I am asking for less aggressive overblocking and better contextual judgment. Right now the system often flags basic human intimacy and emotional realism as if everything were inherently unsafe.

This makes creative work frustrating and unreliable. Please review the current moderation sensitivity and reduce false positives in nuanced fictional contexts.

Right now it honestly feels like the model treats any emotionally charged closeness as suspicious by default. Hugging with feeling, physical warmth, tension between characters, kisses, jealousy, impulsive teenage behavior, morally messy but realistic situations — all of this gets treated far too aggressively. That is not nuanced safety; that is overblocking. It makes the assistant much less useful for serious creative work.

Have you tried other models with the same prompts—Claude, Gemini, GLM, DeepSeek, etc.? It would be interesting to see how other models handle them and whether this is a widespread issue or not.

actually no, cause I prefer working with ChatGBT vecause of its memory — It takes me far too long for DeepSeek, Cloud or others to plough through over 500 pages of information from scratch every time in a new chat. It’s a struggle; the chat ends before I’ve really had a chance to get to grips with it.