GPT 5.5 seems to be degraded

After the usage syncing issue fix (and maybe that same week before that fix), I see noticeable degradation in prompt response quality. I know this is subjective and I don’t have solid empirical evidence to show you but the main pattern I am seeing consistently now is its clear inability to adhere or follow the instructions given and at times it seems to eagerly declare an outcome or a patch but it is incorrect or the code quality is bad enough to cause major regressions.

My prior impression of GPT 5.5 was that it would be able to fix issues very well. I don’t think the codebase has changed or my workflow has been altered. I am primarily working with Flutter nad Xcode and it seems that it has gotten suddenly incapable of one shotting prompts even for the simplest UI changes which is uncharacteristic of its prior performance.

I see several other mentions of this across social media so I can’t be alone in observing these noticeable delta over time. My biggest issue is that using GPT 5.5-med causes the most regressions and coding issues whereas before I never had to reach beyond it to work on UI tasks. Using GPT 5.5-high is also not giving me really noticeable uplift. In desperation I have resorted to using GPT 5.5-xhigh but I notice still that it is not running as long as it used to. Previously xhigh would run for hours and amazing at fixing issues even while I pack them into the prompt. Now it does not seem as capable.

I am also observing some changes to ChatGPT Pro models as well. Previously it would think for very long period of time and rarely would it respond quicker than 10 minutes but now I see that it is responding much more quicker and often with less material attached.

Hoping these issues can get some attention and investigation.

Thanks again for the amazing work OpenAI!

Agreed. Been facing the same issue. Moved back to 5.4, that seems to be working fine for now.

Same feeling to me. I use GPT 5.5-xhigh, it didn’t perform as well as last week, especially with long tasks, frequent errors, and not following the workflow.

I am having a similar experience this morning. Very long turns for simple queries and when it derails and you interrupt/steer, its just ignoring the commands. If you do manage to interrupt, it still ignores what has been said and carries on from its previous plan. Something has changed here since last week.

Same to me. I feel codex become much slower these several days, and always stuck during the task. It’s very frustrating to wait for a long time and find there is not any progress.

Have you tried:

  1. writing prompt manually
  2. having ChatGPT improve your prompt
  3. giving it the improved prompt?

I completely agree. It wouldn’t be meaningless to say that GPT-5.5 currently behaves like 5.3. It couldn’t solve even a very simple task I gave it, despite me repeating the request 15-16 times and providing different perspectives. When working on the project, it acts irresponsibly. In the first days, it worked incredibly effectively, but now it strangely feels as if it has been downgraded.

However, it is not entirely bad. It is still a very good and powerful model. But there is no trace left of the extraordinarily impressive model we saw in those first days.

thats a very peculiar observation, it feels exactly like 5.3 codex, where to fix even simple bugs takes way more prompts than it used to.

the only way its useful is to keep using xhigh/high its weird because i been using 5.5-med without issues and now its suddenly no longer useful

Hello everyone,

Thank you for your reports. This issue was recently addressed in a fix released by the OpenAI team.

Please let us know if you continue to experience this issue on your end.

Can you confirm when? My experience was from Monday but there hasn’t been a cli release since I think or was this openai back-end?

This is still happening to me. Codex frequently gets stuck in the Thinking state at different points during the workflow, especially at the very beginning of a new thread, without showing any feedback or progress for a long time. Sometimes it does eventually work, but the speed is unbearably slow, even for simple tasks. I’ve been experiencing this issue since last Saturday.

At this very moment, the issue still persists. Codex is not merely slow; it repeatedly becomes stuck in the middle of tasks, showing no progress even after prolonged waiting, which is truly exasperating.

I have also been encountering this issue since Monday. Although I updated Codex to the latest version today, the problem still persists. GPT-5.5

Thanks for following up. I’ve forwarded this to the team.

Apologies for the inconvenience. Can anyone of you please submit a support request at support@openai.com. I am requesting this since we need some personal details like session id, etc. We cannot take those details here in public forum. So if you open a request please include:

  • Reproduce the issue in a new chat on codex, right click and copy the session id.
  • do a /feedback and share the details of the issue there.

Once done, please share the session id in the support request and a short summary of the issue. After submitting the support request please share the case id/case subject here so our team can track and troubleshoot the issue for you. Thanks again for flagging it!

Thank you I will give a test to see if anything improved.

I completely share this ‘opinion’ or experience. I started using 5.4 again for extended thinking, and 5.3 Codex for the code (only using 5.5 for simple tasks like GUI changes, as it is faster).

There is also a Reddit topic discussing the exact same findings.

Just reporting back here, I cannot tell if the issue has been fixed but it is catching the errors that were caused before the fix, so I don’t know if that is an indication of improvement or not.

What does surprise me is that this is 5.5-xhigh and its clear that I have to use the most expensive model to really get meaningful work done. Previously, before the degradations were noticed, I wouldn’t have reached xhigh except for planning or one off problem solving that 5.5-high or med could not fix.

I don’t have enough information or confidence yet to switch back to using gpt-5.5-med/high yet. I will continue trying different prompts to see if it can yield any more meaningful changes.

Thanks again for your attention to this matter.

I’m reporting back again and I definitely don’t think the issue has been fixed. I can’t quite put a finger on it but there’s just something about it that feels off still. Normally, I could always rely on xhigh to break through a tough problem but it still doesn’t seem as capable.

Basically I’m working with Flutter and lot of dart code and I have been using codex since September last year and at its current state, its quite frustrating because it feels like we’ve regressed.

Also this shouldn’t be a context issue as I’m running gpt 5.5 xhigh on a fairly new convo. I wish I could offer more information but its tough to convey this “vibe” so to speak, all I can tell you is that it does not seem to have the same punch as it used to. I don’t think the codebase has grown large either, its just since a few weeks ago I started noticing gpt 5.5 make more sloppy mistakes and I don’t think we are quite out of the forest yet.

I’ve finally had to resort to Opus 4.8 to audit gpt-5.5-xhigh’s work and it confirmed the very crux of my complaint with 5.5 that it did surface level edits dressed up as deep work and it escalated a lot of false positives as legitimate problems to solve without consulting me.

Once Opus 4.8 was identify my intent by analyzing GPT 5.5’s code, it was able to one shot all the issues that 5.5-xhigh was “working on” past few days…

Again its of subjective nature but Opus 4.8 confirms the common critique raised of GPT 5.5 and thus raises another concerning point that GPT-5.5-xhigh was not able to catch itself and dig us out of the hole.

This isn’t the first time I’ve experienced this (saw it in 5.4 too) and originally I was very excited to use GPT 5.5 as it appeared to have addressed the shortcomings but it seems like this tendency to perform “exhibitions” rather than truly deep work and not recognizing issue is back.

This is the best I can do in offering an insight to what feels “off” I hope it is helpful in improving the model but I do wonder why we constantly go through this roller coaster ride where a model feels great and then right around the time a new model is expected things just seem less put together.

I’ve not used Claude in months since codex has taken over ($200/month plan here) but this session with Opus 4.8 crystalizes the issues I’ve been having with 5.5 recently and first time I am asking whether I should invest more into Claude. I know Opus 4.8 just came out and I need to run more tests but a $20/month plan suddenly solving what a $200/month plan has been struggling with past few days makes its value proposition worth while to explore integrating into my workflow.

I know the OpenAI team is hard at work here and I trust that they can make this work!

Thank you again for reading this long rant/review.

This absolutely is the case. I just renewed subscription and honestly I regret about it.

Instruction following has became just terrible.

E.g. mentioning harsh ETA for a feature long ago in the chat history somehow made model extrapolate it to the whole work, so it reflected upon itself, saying it implied it had been told to “rush things” (it never had been)

It became viscous - it doesn’t prioritize the current focus and direct steering over the rest of the context. This is clear regression and it’s making it nearly impossible to deliver something of proper quality.

Worth nothing that code quality itself has terribly degraded as well.

Being communicated clear strict quality gates using native language features and align to the language BPs (in my case - nix), it just created a long messy bash script inside a nix derivation callback.

No changes were made in configuration on my end to blame.
No style changes in how I steer it.

Please do something about it, OpenAI.