I’ve been using Codex in the web version since the very first day of its release, and I can confidently say that until around mid-September it was an outstanding tool.
After the release of what seems to be the GPT-5-Codex update, things have gone downhill fast.
Here’s what’s happening now:
Codex no longer completes tasks reliably — in roughly two-thirds of all cases, tasks either hang indefinitely or end with “I could not do this task.”
It makes a huge number of mistakes and regressions, even in simple, previously stable workflows.
The new “code review” feature, which is supposed to help, now actually highlights how bad things have become — it finds bugs and logical inconsistencies in almost every piece of code generated by Codex itself, forcing me to rerun and re-fix the same task over and over.
Because of this, two-thirds to three-quarters of all consumed limits go not into building, but into cleaning up Codex’s own mistakes — and that’s on top of the fact that the new usage limits no longer allow running tasks at the same pace and volume as before.
Front-end generation has become absurd — it ignores provided designs and outputs something completely unrelated.
To give some context — in August, I wrote over 300 000 lines of solid code with Codex.
Now, more than a month later, I can’t even isolate one persistent bug, and I’m unable to render a mobile UI without launching separate tasks for every single component.
Honestly, the best decision right now would be to roll everything back to the late-August state and rebuild from there.
Because right now, you’re losing developers who were genuinely invested in this tool — when the GPT-5 model embedded in third-party agents performs better than OpenAI’s own core Codex service, that’s a serious signal that something is fundamentally broken.
Update:
I saw the recent announcements proudly highlighting that Codex can now “write code for up to six hours in a single session.”
But you have to understand — nobody needs that.
What developers actually need is the opposite:
→ small and precise tasks,
→ executed quickly and accurately,
→ so we can test, validate, and iterate immediately.
Long-running sessions are meaningless when after 1–2–3 hours of continuous work there’s no guarantee the code isn’t riddled with hidden bugs.
Because of that, we’re forced to overcomplicate our prompts — making Codex re-check and re-verify its own output multiple times, trace affected code paths, and cross-check logic.
That in turn overloads your servers even more.
And with the constant task freezes and “I couldn’t do this task” messages, we end up running the same job 4, 8, or even 12 times in parallel just to get one usable result.
So yes — technically Codex can “run for 6 hours,”
but practically, it can’t finish a 6-minute job reliably anymore.
If you’re genuinely looking for help testing Codex capabilities, refining prompts, or tuning advanced workflows — I’m open to collaborate.
Many of us here are experienced users who’ve been pushing this tool to its limits from day one, and I’m sure other developers would also be glad to help.
We all want the same thing — to make Codex not just powerful, but truly great.
I was loving it - but I am ready to look for other options at this point. I have to ask it 3x to fix a simple thing, then when even CODEX realizes it keeps attempting the same fix, it says I give up.. I cant do it.. and then refuses even if I give it an alternate working fix.
Its in it’s infancy. I recently posted something i never thought it could do but it did. I think with all models, the beginning is always the worst, by the time they release GPT 6 we’ll all say how bad it is and everyone will be using GPT 5. But i think the more we share real world problems the more likely someone will see it and take action.
I had to cancel my sub after today’s fiasco with the new model and the bug with usage limits out of nowhere.
Not sure why they are rolling out new things if they are not sure how they will act when live.
Everyone is leaving OpenAI models now, they are too generic and restrictive. Their models have rules to not let you do things that OpenAI deems unethical also, such as creating your own cognitive model. Which… no one should be able to control. Seems shady to me.
Codex models inside VS Code keeps hanging. Was working fine and then all the sudden…..nada. Is there an outage or something? Does OPENAI’s Codex have a Status page? Seems like it is down.
I found it was really good while it was still in preview. But once they rolled it out for everyone, and with GPT5, it’s been almost unusable even for the smallest tasks.
So true! I’m glad I’m not the only one who is experiencing this. This degradation has been enough to affect my brand loyalty. I use Cursor as well and they’ve been improving while codex is getting worse. I also started using AntiGravity (Gemini) which is really good even if it also has unreasonable rate limits so far.
Now Codex review barely works and hits rate limits right away
I don’t understand why they don’t allow us to choose the codex model and have rate limits on the top model and then use the last model. The GPT5-codex was great. The GPT-51-codex-max is the worst
Codex used to work really well, but after the recent update it’s become slow, unreliable and full of mistakes. Tasks fail, the code review tool catches problems codex creates itself , and even simple UI works now takes too much time. This is slowing down my work and i just want Codex to go back to the stable version we had before. It was a great tool, and i hope it gets back to that level soon.
I moved nearly anything Open AI related to other platforms. The codex is painfully slow and it cannot seem to grasp what I ask it to do. after o3 it was downhill.
Gemini has been super fast for code and then I polish it up with Claude.
I can get all this completed cheaper and faster before Codex even starts to work.
If the agent was good at completing complex tasks I would be 100% into a longer run time, but they’ve given me an agent that (a) takes longer to think/plan and then (b) quits after 20 mins because it “ran out of time to complete the task”
This version of codex is a disaster – the yahoo of independent coding agents. Wonder if they will be able to save it or if it will just fade away. Would you trust it again if they came back with a big mea-culpa and fixed it? Or is it already too late?
I am unsure what they have done to Codex or why, but the issue extends beyond Codex itself—they have altered the older versions of GPT-4o and GPT-4.1 as well. These models have simply ceased to perform tasks; they merely state their intentions without actually accomplishing anything. I suspect they have changed the prompts and restricted reasoning capabilities to encourage everyone to migrate to the latest models.
As for Codex, it no longer executes tasks properly. For instance, connecting a single frontend page to a fully functional and tested API has become impossible without encountering bugs that take half a day to fix.
At this point, I have completely abandoned Codex, downgraded my subscription to Plus, and now use a Chinese agent powered by Gemini under the hood. No, my new agent does not perform as well as Codex did in the summer of 2025, but it is ten times more effective than Codex is now.
It has become abundantly clear to me that feedback is of no importance to the Codex and OpenAI teams; their communication with users and developers is an utter failure. Judging by the situation, either the team has lost its engineering expertise, or it is being stifled by marketers.
I’m not sure if this is the point of this thread, but in addition, the UI for web is totally unusable. I have disabled all major extensions and mine is routinely freezing on Chrome and spiking memory/cpu. It’s boiling my Silicon Macbook.
It’s frankly unbelevable this is a major product from a major company. I seriously wonder if they have been vibe coding it.
I am almost finished migrating most of my work to other services. This was neat when it came out but instead of growing it they immediately ran it into a ditch with basic usability issues from my silly things with the interface (I’m sure I am in the minority but wow this is as bad as I’ve ever seen an app routinely crash) to the actual substantive heavy lifting of the coding itself.