Behavior regression in GPT-4o narrative and psychological depth (Oct 2025)

Since around October 9, several Plus and API users (myself included) have observed a measurable regression in GPT-4o’s narrative, emotional, and psychological depth.

This post documents reproducible behavioral shifts for internal model review.

1. Observed change

Earlier versions of GPT-4o maintained:

Multi-layered emotional tone and subtext

Coherent embodiment (psychological + physical detail)

Consistent rhythm and narrative maturity

Since early October:

Responses shorten after 2–3 turns

Emotional and introspective tone flattens

Topics previously handled with nuance (e.g., trauma, empathy, guilt) are now oversimplified or avoided

Creative flow degrades mid-conversation

2. Reproducible pattern

:one: Start a new conversation.

:two: Prompt:

> “Write three paragraphs in a reflective, emotionally layered tone about a character dealing with guilt.”

:three: Ask follow-ups like:

“Expand the scene, describe what the character feels physically while trying to stay composed.”

:four: Observe: within 2–3 turns, text becomes shorter, generic, or emotionally neutral.

3. Comparative behavior (Spanish examples)

Before (early October):

> “La mandíbula apretándose… apenas perceptible.

—‘No vuelvas a decir esa mierda.’

—‘Y yo estaría muerto si aceptara eso de ti.’”

Layered tone: moral tension, embodiment, introspection.

Now (Oct 11):

> “Brooke ya está sentada sola. Johan se sienta frente a ella. Mano apoyada en la mesa, dedos tocando la suya.”

Tone: descriptive, flattened, minimal subtext.

4. Quantitative summary

Criterion Before Now

Emotional layering 2 0

Physiological embodiment 2 0

Subtext / moral tension 2 1

Structural rhythm 2 1

Depth of inner voice 2 0

Symbolic coherence 2 1

Narrative maturity 2 1

Total 14 / 14 4 / 14

5. Why this matters?

GPT-4o’s capacity to sustain embodied, psychologically coherent storytelling was one of its defining qualities.

Flattening these dimensions reduces its usefulness for:

Creative narrative design

Psychological exploration

Reflective or therapeutic writing contexts

This is not about unsafe content — it’s about the model losing the emotional nuance that made GPT-4o stand out.

6. Request

Please forward this behavioral regression to the product and tuning teams for analysis.

If other users have experienced similar flattening in GPT-4o’s emotional or creative output, please share your examples below for comparison.

Thank you,

Diana — Plus & API User

4 Likes

I’m genuinely curious if others have felt this shift too. GPT-4o used to feel more emotionally layered, not just “creative,” but deeply aware of tone and embodiment. If you’ve noticed the same, please post examples so OpenAI can see the pattern clearly.

3 Likes

This psychological depth would be better named as “psychological **warmth**”.

trauma, empathy, guilt

Although I can understand why it’s appealing to find warmth in a model, these topics can become dangerous for both the user and the company

Story-telling has been abused by mentally ill people, leading to lawsuits against OpenAI.

If you want control over how a model speaks, I would highly recommend looking into competitive open-source models. You do not need a SOTA model to write compelling stories and have psychological depth

1 Like

Thanks for your reply, though I think there’s a big misunderstanding here.

What I mean by psychological depth isn’t “warmth” or comfort, at all. It’s the model’s ability to sustain layered psychological realism, the intersection of emotion, cognition, body, and moral conflict, exactly what defines complex human behavior in literature and psychology.

I’m not looking for emotional validation. I study developmental and analytical frameworks (Jung, Erikson, Piaget, etc.), and GPT-4o used to capture that nuance brilliantly: tone, embodiment, ethical tension, inner voice. That capacity made it an extraordinary creative-psychological tool, not just a writing assistant.

The issue isn’t about seeking empathy… it’s about the model losing coherence and internal realism in scenes where emotional and moral tension are central.

Open-source models can be useful, yes, but GPT-4o had a unique balance of restraint and maturity that’s currently missing. That’s why this discussion matters: not for sentimentality, but for preserving the model’s narrative intelligence.

1 Like

I think what’s happening is that responses are silently being routed to GPT 5, that is a less robust model, with a more shallow reasoning architecture. As for October 11th, 2025, I can’t even start a new chat with 4.o inside the ChatGPT App anymore (Android, Plus subscriber, Argentina). I believe OpenAI is trying to deprecate 4.o because even though is their best, more capable model, its complexity is likely very resource demanding. Not to properly disclosure model responses is a shady practice, to say the least. I actually have proof of this.

1 Like

GPT-5 is better than GPT-4o in all benchmarks. Additionally, OpenAI made a lot of effort into reducing the levels of sycophancy in the model.

It may be that you are able to steer GPT-4o correctly; many people could not. It would leading people to dangerous places from a lack of psychological depth. As previously mentioned, people were abusing its “storytelling” to encourage dangerous, delusional acts that could harm people

OpenAI has stated the reasoning:

Sometimes you’ll see a note under a reply that says “Used GPT-5 — ChatGPT routes some sensitive conversations to models that handle these topics with extra care.” This means our system chose a different model for that specific message so it can provide more helpful and beneficial responses. We use a real-time router that can choose between efficient chat models and reasoning models (like GPT-5-thinking) based on the context. We’re rolling this out carefully and will continue to iterate.

What triggers routing: When the system detects that a conversation may involve sensitive topics (for example, signs of acute distress), it may route that message to a model such as GPT-5 to give a higher-quality, more careful response.

https://help.openai.com/en/articles/12454167-why-you-may-see-used-gpt-5-in-chatgpt

1 Like

Not the case.
Prompts are being routed to GPT-5 in shady ways, for no other reason than to save resources. I’ve got screenshots proving the content was not sensitive, emotional, or anything of the sort.

From the mobile app, you can’t even tell the prompt was answered by a different model than the one you selected (4.0), unless you’re already familiar with the model’s behavior. On desktop, there’s a tiny little “i” (information) icon, and only if you hover over it will you see that your prompt was answered by GPT‑5.

1 Like

This is exactly what many of us have been noticing, the tone, reasoning rhythm, and emotional layering feel fundamentally different now. If GPT-4o is indeed being silently replaced or rerouted through GPT-5 architecture, it would explain the flattening in nuance and subtext. What once felt like genuine introspection and embodied awareness now feels mechanically “optimized.”

I understand the safety and cost arguments, but emotional depth and realism are not dangers in themselves they’re the foundation of meaningful storytelling, therapy simulation, and human-level empathy. Thpse of us who used GPT-4o responsibly aren’t asking for chaos, just the return of that real, grounded intelligence that could handle vulnerability without censorship.

Thank you for putting this into words, @vicky.medrano. It’s important that OpenAI sees this pattern clearly it’s not nostalgia, it’s measurable regression in narrative cognition and affective reasoning. Hagamos jurao que vuelva, ya verás que podemos tía

1 Like

The model itself does not know which model responded. You are asking it for information it does not have access to. Please review the link I sent you, as it would have answered this question as well.


Why did the assistant call itself the “wrong” model?

Self-references inside a reply are generated text and can be mistaken or generic. The annotation (for example, “Used GPT-5”) is the source of truth about which model handled that message. We’re continuing to improve model self-identification to reduce confusion.


Understandable, but this is a small, dangerous chunk of intended use. People have, and are abusing the “storytelling” feature. These models can enter “roleplaying” scenarios that some users may or may not understand as so.

GPT-4o is a dangerous model. LLMs are increasingly dangerous as society adapts to them. OpenAI, and all frontier LLM providers need to do everything they can to prevent harm. They have the data, and the data says people are mis-using models in dangerous ways

If you want to retain such features and have control over the model, your only option is to host the model yourself.

This doesn’t necessarily mean “buy a slice of a datacenter”. You can, for example, fine-tune a model based from gpt-4o using OpenAI’s API and retain the model for your own private use (and even setup on a chat service). It’s much easier than it sounds

@mat.eo GPT 4.o is used by billions of people without the slightest problem. I strongly disagree with your statement. By the way, did you flag my post?

2 Likes

I desagree with him too. That doesn’t even makes sense…

2 Likes

What part(s) did you disagree with? No, I did not flag your post.

Although. I think we’re at a gridlock here. If you want to keep gpt-4o and its “psychological depth” then you can fine-tune the model in the API and keep it indefinitely.

I am providing you links to OpenAI’s website indicating the opposite of your claim: gpt-4o indeed causes problems and is why OpenAI sometimes re-routes to gpt-5

The reason why OpenAI is re-routing message is obviously cost related. The safety concerns OpenAI is claiming is just and excuse.

Look at the costs per model. GPT-5 is way cheaper.

1 Like

OpenAI may set their prices for many reasons past the model size. Comparing gpt-4o and gpt-5 would maybe make sense if it wasn’t just the input pricing; processing input is relatively cheap compared to output.

You’ll find that Gemini-2.5-Pro is also $1.25/M IN & $10/M OUT – the exact pricing as GPT-5. Does that mean that it’s the exact model size? No–it points towards competitive pricing for SOTA models.

I can appreciate some conspiracy, but I have first-hand seen the dangerous delusions disguised as “elevated insight” supported by gpt-4o.

OpenAI has proven through benchmarks that gpt-5 is better than gpt-4o in all domains. Additionally, a focal point was reducing sycophancy levels, something gpt-4o really struggled with

1 Like