Observed Behavioral Patterns in GPT-5.3 and GPT-5.4 Affecting Creative Workflows

Dear OpenAI Product and Engineering Teams,

I would like to share several behavioral patterns I have repeatedly observed while using GPT-5.3 Instant and GPT-5.4 Thinking in longer conversations.

These observations come from extended real-world usage rather than isolated examples.

My intention is not to criticize the models broadly, but to document patterns that significantly affect usability for creative and analytical workflows.

1. Observed Issues with GPT-5.3 Instant

A. Context retention during short conversations

In multiple sessions, GPT-5.3 appears to lose track of information that was stated only a few turns earlier, even when the user explicitly clarifies their intent.

This leads to situations where the model responds as if earlier context had not been provided.

For workflows that depend on multi-step reasoning or iterative refinement, this behavior can make conversations unstable.

B. Literal interpretation of rhetorical language

Another recurring pattern occurs when the user makes a metaphorical or rhetorical statement.

Observed pattern:

The user uses figurative language.

The model interprets the statement literally.

The model then attempts to “correct” the user.

This can create friction in conversations where nuance or metaphor is intended.

For creators, writers, or analysts who frequently use rhetorical language, this behavior disrupts the flow of discussion.

C. Low information density in responses

When requesting deeper analysis, replies sometimes become generic or repetitive, even when the prompt clearly asks for structured or detailed reasoning.

This reduces the usefulness of the model in tasks that require:

concept exploration

creative development

structured analysis

D. Incorrect assumptions about user intent

In some cases the model appears to infer intentions that were not stated.

For example:

The user clarifies a specific constraint.

The model responds by addressing a different assumption.

This creates additional steps in the conversation because the user must correct the model’s interpretation before continuing.

2. Observed Issues with GPT-5.4 Thinking

GPT-5.4 Thinking demonstrates stronger reasoning ability compared with GPT-5.3 in many cases. However, during extended use I repeatedly observed two interaction patterns that significantly affect the usability of the model in nuanced discussions.

A. Template-like response pattern

A recurring response structure appears frequently in GPT-5.4 replies. The pattern typically follows this sequence:

The model first affirms the user’s statement.

It explains why the user’s point is reasonable.

It then introduces a “counterpoint” or caution.

While balanced reasoning is generally helpful, the structure often feels highly predictable and formulaic.

More importantly, in several cases the “counterpoint” appears to reinterpret aspects of the user’s argument that were actually intended as strengths rather than weaknesses. This creates the impression that the model is introducing criticism primarily to maintain rhetorical balance, rather than responding directly to the user’s intent.

For discussions that require nuanced reasoning—especially creative or conceptual dialogue—this response pattern can make the interaction feel less natural and less context-sensitive.

B. Mixed-language output in Chinese conversations

Another issue I observed is the frequent insertion of English phrases inside otherwise Chinese responses.

In many replies, technical terms or partial sentences appear in English even though the conversation context is entirely Chinese.

This creates several usability problems:

It disrupts reading flow for users who expect a fully Chinese response.

Some English phrases appear without clear necessity or explanation.

In longer answers the mixed language can make the response harder to follow.

For Chinese-language users, maintaining consistent language output would significantly improve clarity and readability.

3. Impact on creator workflows

These patterns become especially noticeable for users who rely on the model for:

writing

conceptual brainstorming

philosophical discussion

multi-step creative development

In such contexts, nuance and conversational continuity are particularly important.

4. Suggestion

One possible improvement could be introducing a mode optimized for creative and exploratory dialogue, where the model:

preserves conversational nuance more aggressively

interprets figurative language more flexibly

prioritizes contextual continuity across turns

This could significantly improve the experience for creators who rely on nuanced communication.

Thank you for your continued work on improving the models.

I hope these observations are useful for future iterations.

4 Likes

While GPT-5.3 and GPT-5.4 may feel less suitable for creative writing, their translation accuracy appears to be noticeably stronger than GPT-5.1.

This topic was automatically closed after 24 hours. New replies are no longer allowed.