ChatGPT 5.4 Pro Standard Mode - Adaptive Thinking or Nerfing Model?

Hi everyone,

I’m trying to determine whether other users are seeing a similar behavior change with GPT-5.4 Pro Standard on long-context, high-effort tasks.

I’m not claiming a confirmed backend bug. I’m looking for comparison data because the change I observed is large enough that it does not look normal

What I tested

I have a repeatable long-context task that requires the model to:

  1. read a large uploaded context/file packet,

  2. reconcile multiple source documents,

  3. identify pending work,

  4. produce a concrete written deliverable,

  5. include an actionable implementation/review plan.

This is not a short Q&A prompt. It is the kind of task where the model needs sustained reasoning and careful file/context handling.

What I observed (using the same task as to have imperial test diagnostic data)

A prior run of the same class of task, using GPT-5.4 Pro Standard, took roughly 60 minutes and completed the work correctly.

A later run, also using GPT-5.4 Pro Standard, completed in roughly 8 minutes, but the output was materially lower quality. It looked more like a readiness/summary response than the actual requested deliverable. Same task and files, it just change from a day to the next.

The issue was not simply that the model was faster. The issue was:

GPT-5.4 Pro Standard run A: ~60 minutes, complete and correct
GPT-5.4 Pro Standard run B: ~8 minutes, incomplete and missing the core deliverable

Why this seems concerning

For this task type, a correct answer required the model to stay engaged across a large context and produce a concrete output. Instead, the shorter run appeared to stop at a high-level framing/acknowledgement stage.

The shorter run did not just compress the work. It skipped the central artifact the task required.

This resembles a lower effective reasoning-effort budget, but I cannot see the hidden backend setting, so I do not know whether the cause is:

a temporary routing/configuration issue,
a hidden reasoning-effort change,
file/context handling degradation,
early stopping behavior,
or normal model variance.

Why I do not think this is just normal variation

A swing from about 60 minutes to about 8 minutes for the same class of long-context task is large by itself.

But the stronger signal is output completeness:

Earlier run: long duration, complete deliverable
Later run: short duration, plausible-looking summary, missing deliverable

The later answer looked superficially responsive, but it did not complete the actual work requested.

This is a repeated pattern I’ve noticed before when a new model was released and one stay using the same “old” or not current latest model, so maybe is the case since this happen on a Saturday Apr 18th, that a new model might come out or something, but not some I can know.

Secondary tool/context anomalies

I also noticed some possible tool/context weirdness during diagnostics, though these may be separate issues:

  • uploaded file retrieval seemed inconsistent;

  • search over uploaded/context files appeared to surface unrelated prior material;

  • a simple Python/stdout test behaved inconsistently in one diagnostic path, while a direct Python path worked.

Again, those may be unrelated, but I’m mentioning them in case others are seeing similar clusters.

Questions for other users

Has anyone else recently seen GPT-5.4 Pro Standard:

  • finish long reasoning tasks much faster than before;

  • produce a plausible-looking summary instead of the requested artifact;

  • appear to use a lower effective thinking budget;

  • skip file/artifact production in tasks where prior runs completed it;

  • behave differently across otherwise similar Standard-mode sessions?

Useful comparison data would be:

same or similar prompt
same uploaded/context size
model setting used
earlier run duration and quality
later run duration and quality
whether the final deliverable was actually produced
whether the run seemed to stop at summary/readiness instead of execution

I’m trying to determine whether this is expected variance, a temporary configuration/routing issue, a file/context handling issue, or a broader regression in effective long-context reasoning within GPT-5.4 Pro Standard.

It all seems faster and likely users will say or noticed that ChatGPT is alot faster, like 4x or more, before it was slower to go from a prompt sent to Thinking and in the Thinking tab the steps usually would take longer now it all goes much much quicker similar in a way to a real-time chat. Just want to know if this is the new normal so I can see what and how to engineer around it or alternatives.

1 Like

I’m mostly trying to understand if this is the new normal, because then I need to know how to engineer around it, change my workflows, or look at alternatives.

I have Pro 20x, so I’m also wondering if maybe this is going to become a new subscription tier later, with fewer of these limits or deeper Thinking available.

Not saying I know what changed, because I can’t see the backend settings. But it feels similar to the recent Anthropic Adaptive Thinking change: everything got faster, but the deep reasoning feels like it may have been reduced. Curious if anyone else is seeing the same thing.

I am also experiencing this and I feel this is very bad experience since I start to subscribe PRO since 2023. The quality of answers in these faster thinking and exploration loses the value I feel ChatGPT is offering over Claude and Gemini. For complex tasks, in the past it was often taking 60-90 mins to get an answer, now collapses to ~10 min, with clearly shallow evidence and reasoning. This may be due to the cost cutting in OpenAI (like closing Sora etc) but I feel this may let OpenAI loosing their most unique customers who frequently use Pro to address complex and long context problems. Simply feel disappoint for ChatGPT if they are doing this for budget cutting.