I’ve been testing gpt-5-chat-latest
vs gpt-5
in the API for our AI mental health companion app — www.onsenapp.com.
Key Observations
-
gpt-5
is not ideal for conversational chat experiences.
Its responses tend to:-
Lack personality
-
Be overly structured (too many bullet points/lists)
-
Feel unnatural in the flow of a continuous conversation
-
-
gpt-5-chat-latest
performs much better for chat, with a tone and “vibe” more similar togpt-4.1
.
The Trade-off
Since gpt-5-chat-latest
does not support structured outputs, our model-selection logic has become more complex:
-
For natural conversation:
gpt-5-chat-latest
works best -
For structured outputs & complex instruction-following:
gpt-5
works best
Previously, gpt-4.1
(and gpt-4o
before that) handled both use cases equally well, so I’m disappointed that the GPT-5 family now forces one more decision point in our prompt-engineering pipeline. This feels like a step away from the “one model to rule them all” philosophy that GPT-5 was supposed to bring.
Gap in the Portfolio
There’s currently no mini equivalent of gpt-5-chat-latest
.
This creates a gap:
-
gpt-5-mini
is not suitable for conversational experiences -
There’s no drop-in replacement for
gpt-4.1-mini
in chat-heavy scenarios
In short:
GPT-5 offers stronger specialization but weaker versatility for teams building products that rely on both natural conversation and structured output from the same model family.