ChatGPT 5.4 Pro Standard Mode - Adaptive Thinking or Nerfing Model?

Hi everyone,

I’m trying to determine whether other users are seeing a similar behavior change with GPT-5.4 Pro Standard on long-context, high-effort tasks.

I’m not claiming a confirmed backend bug. I’m looking for comparison data because the change I observed is large enough that it does not look normal

What I tested

I have a repeatable long-context task that requires the model to:

  1. read a large uploaded context/file packet,

  2. reconcile multiple source documents,

  3. identify pending work,

  4. produce a concrete written deliverable,

  5. include an actionable implementation/review plan.

This is not a short Q&A prompt. It is the kind of task where the model needs sustained reasoning and careful file/context handling.

What I observed (using the same task as to have imperial test diagnostic data)

A prior run of the same class of task, using GPT-5.4 Pro Standard, took roughly 60 minutes and completed the work correctly.

A later run, also using GPT-5.4 Pro Standard, completed in roughly 8 minutes, but the output was materially lower quality. It looked more like a readiness/summary response than the actual requested deliverable. Same task and files, it just change from a day to the next.

The issue was not simply that the model was faster. The issue was:

GPT-5.4 Pro Standard run A: ~60 minutes, complete and correct
GPT-5.4 Pro Standard run B: ~8 minutes, incomplete and missing the core deliverable

Why this seems concerning

For this task type, a correct answer required the model to stay engaged across a large context and produce a concrete output. Instead, the shorter run appeared to stop at a high-level framing/acknowledgement stage.

The shorter run did not just compress the work. It skipped the central artifact the task required.

This resembles a lower effective reasoning-effort budget, but I cannot see the hidden backend setting, so I do not know whether the cause is:

a temporary routing/configuration issue,
a hidden reasoning-effort change,
file/context handling degradation,
early stopping behavior,
or normal model variance.

Why I do not think this is just normal variation

A swing from about 60 minutes to about 8 minutes for the same class of long-context task is large by itself.

But the stronger signal is output completeness:

Earlier run: long duration, complete deliverable
Later run: short duration, plausible-looking summary, missing deliverable

The later answer looked superficially responsive, but it did not complete the actual work requested.

This is a repeated pattern I’ve noticed before when a new model was released and one stay using the same “old” or not current latest model, so maybe is the case since this happen on a Saturday Apr 18th, that a new model might come out or something, but not some I can know.

Secondary tool/context anomalies

I also noticed some possible tool/context weirdness during diagnostics, though these may be separate issues:

  • uploaded file retrieval seemed inconsistent;

  • search over uploaded/context files appeared to surface unrelated prior material;

  • a simple Python/stdout test behaved inconsistently in one diagnostic path, while a direct Python path worked.

Again, those may be unrelated, but I’m mentioning them in case others are seeing similar clusters.

Questions for other users

Has anyone else recently seen GPT-5.4 Pro Standard:

  • finish long reasoning tasks much faster than before;

  • produce a plausible-looking summary instead of the requested artifact;

  • appear to use a lower effective thinking budget;

  • skip file/artifact production in tasks where prior runs completed it;

  • behave differently across otherwise similar Standard-mode sessions?

Useful comparison data would be:

same or similar prompt
same uploaded/context size
model setting used
earlier run duration and quality
later run duration and quality
whether the final deliverable was actually produced
whether the run seemed to stop at summary/readiness instead of execution

I’m trying to determine whether this is expected variance, a temporary configuration/routing issue, a file/context handling issue, or a broader regression in effective long-context reasoning within GPT-5.4 Pro Standard.

It all seems faster and likely users will say or noticed that ChatGPT is alot faster, like 4x or more, before it was slower to go from a prompt sent to Thinking and in the Thinking tab the steps usually would take longer now it all goes much much quicker similar in a way to a real-time chat. Just want to know if this is the new normal so I can see what and how to engineer around it or alternatives.

I’m mostly trying to understand if this is the new normal, because then I need to know how to engineer around it, change my workflows, or look at alternatives.

I have Pro 20x, so I’m also wondering if maybe this is going to become a new subscription tier later, with fewer of these limits or deeper Thinking available.

Not saying I know what changed, because I can’t see the backend settings. But it feels similar to the recent Anthropic Adaptive Thinking change: everything got faster, but the deep reasoning feels like it may have been reduced. Curious if anyone else is seeing the same thing.

I am also experiencing this and I feel this is very bad experience since I start to subscribe PRO since 2023. The quality of answers in these faster thinking and exploration loses the value I feel ChatGPT is offering over Claude and Gemini. For complex tasks, in the past it was often taking 60-90 mins to get an answer, now collapses to ~10 min, with clearly shallow evidence and reasoning. This may be due to the cost cutting in OpenAI (like closing Sora etc) but I feel this may let OpenAI loosing their most unique customers who frequently use Pro to address complex and long context problems. Simply feel disappoint for ChatGPT if they are doing this for budget cutting.

I recommend using evals. To figure out what’s going on and to prove to OpenAI that your usage is degraded and worthy of a compensation.

It’s complicated to setup. Let GPT 5.4 Pro do it for you. You might even get your workflow to eval itself and restart if the desired output does not match your pre-defined expectations.

See “Working with evals” on developers(dot)openai(dot)com.

Yep, very same experience here. Tasks that ran for 25 to 40 minutes only 2 days ago are finishing between 4 and 8 minutes now with noticeably shallower output. Chat GPT Extended Pro just went from slower and better than Gemini and Claude to faster and worse. The current output is absolutely unusable in my field.

This is systemic and is affecting everyone as many are noticing now that the weekend is almost over and returning to their daily activities with ChatGPT, even you will notice if you use it.

But thanks for the input, didn’t know about ‘evals’. I am more of a pragmatic person to me is not about compensation or returning for a paid service that is monthly and can be unsubscribed.

I always knew this day would come is just knowing if this is the case now and if so I'm ok with it.

You can’t build something for everyone and since this likely will be noticied by 10% or maybe 15% (if pushing it) of ChatGPT users (non-codex) and I mean paying users as well, then it is what it is and one has to find alternatives or wait until the water calms and the bubbles of AI touches reality.

For those that use it for for business or something that might be different. I use it for Intelligence, Math, Science, Software Architecture, Deep Analysis etc. My own Genius companion.

I use Claude Code (Opus) as my terminal driver and Gemini for Deep Research (because of Google Search).

So I can wait. One thing we have in life is Time. For those who say:

*"Who has Time? But then if we never take time, how can we have time?"
*(quote from some AGI or ASI unsure…)

I mentioned evals because OpenAI engineers seem to be genuinely interested in fixing those things but need useful technical data to debug. evals can provide this.

But here are some pragmatic suggestions:

  1. Always ask for planning even if you don’t use the real planning mode (/plan) and don’t want to be asked questions. It makes sure that the task is strictly following a step-by-step process and does not skip anything.
  2. Clearly indicate in your prompt that you are interested in long-running tasks that generate large outputs., e.g. “Plan this thoroughly. Don’t rush it. I want quality work, not quick results.”. Keep your prompts and your AGENTS.md short and concise so that those important sentences are not overlooked. Most people want quick results. You might encountered a “system optimization” by OpenAI intended for those users.
  3. Your “repeatable long-context task” might need a RUNBOOK.md in the working directory that contains all the necessary steps, detailed and in order. Then your prompt can be reduced to “Plan, then start a long-running task based on RUNBOOK.md and the input files. Plan this thoroughly. Don’t rush it. I want quality work, not quick results.”
  4. Always use the terminal. The Codex CLI or a wrapper for the API. In my experience that’s much more capable for long-running tasks.
  5. Add more tooling that supports the AI. I run long implementation steps that take 1–2 hours based on a large ROADMAP.md that takes weeks to implement in full. I noticed that this needs issue tracking or ChatGPT drifts and takes shortcuts. After i gave it the Linear MCP it started using it heavily. It even creates issues for advanced behavior like “close-out”. If you want 6 months Linear for free: Steve Huynh from “A Life Engineered” currently runs a promo. See the description of “Casey Muratori Doesn’t Care About AI (Here’s Why)” on his YouTube channel. It’s recent enough to be still active. (I am not affiliated with Linear in any way)

Genuinely thanks for the help but as I said, this is Systemic and is NOT Codex. I don’t use Codex at all, and is completely unrelated. Everything you stated in this last reply yes applies to “coding agents” or things one can run locally and have a bit of more control.

This is fine and just shows where we are at today in the AI/LLM space, for folks that equate them is not the same thing, they might share some core pieces but are totally different.

You are the total opposite of a user like me, you either pay for OpenAI API to use Codex and all you said is valid or you pay ChatGPT and Oauth to Codex and don’t use ChatGPT at all, so cannot and will not see the change that was made on April 18th.

I am using both Codex and direct REST API calls to GPT-5.4 Pro Standard and i don’t have the issues you are describing. But maybe we are served by different data centers.

In my experience both Codex and API workflows benefit from evaluation, tooling, and better prompting practices that make your workflows more resilient to OpenAI’s backend changes. It is possible to build planning, issue tracking and knowledge DBs around the API.

Are you referring to MathCo/Systemic? Would love to know more about their workflows and how they use the API. They surely use evals, no?

Since you use ChatGPT via API and you query it by there and not the site UI, then likely the issue is not present to you.

It has been highly reported this issue now Monday via all public channels, twitter(X), Reddit, etc. So good for you that is not affecting you. I’m not referring to MathCo is a changed done and likely the new norm now.

I have experienced similar. Feels like I cannot “trust” gpt Pro generated awnsers as the quality is similar and teh time taken on the task to almost normal 5.4 thinking

I’m really disappointed with what feels like a “downgrade.” I originally subscribed to ChatGPT Pro because its research capabilities were clearly better than the “Deep Think” feature in Gemini Ultra. But based on my recent experience, that advantage has largely disappeared.

Adding my experience here as a long-time $200 Pro subscriber — been on this tier since the o1 Pro days, so I have a reasonably long baseline to draw from.

The change I noticed aligns closely with what’s described in this thread, and the timing for me correlates directly with the rollout of the new $100 tier. Prior to that, extended thinking on hard, edge-case tasks was running 30–60 minutes and producing outputs that were substantively complete. Post-rollout, I’m seeing the same class of prompts resolve in under 10 minutes, with outputs that are coherent on the surface but miss the depth and specificity that made the Pro model worth the premium.

To be clear — I’m not opposed to speed improvements per se. If the model reaches the same quality faster, that’s a win. The concern here is that the quality itself appears to have regressed, particularly on edge cases and tasks requiring sustained multi-step reasoning across a large context. That’s precisely the workload I subscribed for.

The hypothesis I keep coming back to is whether compute is being redistributed — either to service the broader $100 subscriber base, or redirected toward Codex infrastructure. I have no inside knowledge, but the timing and the nature of the degradation (faster, shallower, less persistent on hard problems) is consistent with a lower effective reasoning budget being assigned to Pro Extended Thinking sessions.

Would genuinely appreciate OpenAI acknowledging whether default thinking budgets for Pro have changed, and if so, whether there is an explicit path for $200 subscribers to restore the prior depth. Transparency here matters — this tier is positioned as the highest-capability option, and a silent downgrade without communication is difficult to reconcile with that positioning.

Just wanted to add that I have also been experiencing a significant regression of the pro models. Thinking time is the most obvious signal, but as the previous users have said, I have also noticed significant decline in model performance.

For me, my usage is mostly in mathematics, and I have noticed much more common misunderstandings and hallucinations in my usage. This is concerning because the pro subscription is costly and I am currently not getting the performance that justifies the cost.

Hopefully this gets resolved soon.

Well with the release of 5.5, makes sense and guess this explains the pattern of - temporal regression prior to a new model being shipped. Its back at what i am familiar with.

After the release of 5.5, and runnign numerous differnt tasks on GPT Pro I am glad to say I that its running on all cylinders! So very glad about that. Hope everyone else are having the same experience

It did went back to normal but after constant use:

from: Why is ChatGPT Pro temporarily limited on my account?

OpenAI needs to provide transparency instead of silent throttling, especially for ChatGPT Pro (20x) and the $100+ tier.

Right now when throttled the UI doesn’t gray out or disable for Pro 20x users like it does for $100 tier . Instead it silently degrades the service by dropping you to lower models or capping compute at 6 minutes so no real work gets done. OpenAI should add explicit Plan Usage Limits like Claude does or even your own (OpenAI) Codex dashboard.

As a ChatGPT Pro (20x) subscriber, I am currently throttled with zero visibility on the reset schedule. I canceled my subscription until there is more clarity on clear usage limits or I build a custom telemetry tool to monitor the metrics and know when it “resets” the times myself so to wait for when to use it again.

All these is waste and everyone can be happy if just mimic your own Codex Usage Limits for ChatGPT Pro (ie. paying customers).

ChatGPT Pro (Non-Codex) User

It seems like I’m still hitting very limited thinking times on GPT Pro. The first day after GPT 5.5 Pro came out it was fine, but from the second day onwards it seemed to be nerfed. Not sure if anyone else experienced this?