I really want to like Codex, and I can see that a lot of resources have been invested into it. However, from a professional’s perspective, the experience is quite frustrating. I understand how much work has gone into this, but with the resources available to OpenAI, it still feels lacking in several key areas.
1. What’s going on with PRs and naming?
Naming isn’t usually a strong suit for models, and that can be entertaining in some contexts. But when it comes to branches, it’s less amusing. The naming and management of branches in Codex feel arbitrary at times. It’s often unclear whether you’re creating a new branch or updating an existing one, and sometimes you’re even forced to switch branches halfway through. While coding is quick, following the progress on GitHub can be really confusing.
2. The interface to Codex (for professionals) should be on Github
I get that Codex is accessible from mobile devices, and that’s a nice feature, especially for late-night tasks. But it often feels like a mystery where the job ends up. Viewing file diffs is not intuitive, and there’s often little explanation of the changes that were made or why. And since Codex can sometimes be slow or unresponsive, you often need to prompt it to complete the task it was initially set to do.
For professional environments, it’s clear that GitHub provides the best UX for collaboration, and that’s where Codex should be integrated. GitHub has spent years perfecting its UX for both humans and bots, and that’s where Codex would be best used.
I appreciate the direction Anthropic is going with this, and I enjoy coding locally with 4.1 and using o3 remotely. But right now, Codex’s experience feels a bit fragmented and unpolished.
3. The environment should be docker-compose (or at least a standard CI actions setup)
The Codex setup file is quite detached from your local filesystem. It’s pasted into Codex, which means there’s no easy way to manage it in source control. And when you open multiple branches, it’s hard to know which one is which. If the setup file breaks, you essentially have to start over.
Instead of asking users to define professional-grade code in a proprietary setup file, it would be far more efficient to integrate with existing CI scripts and Docker files. Many of us have already set these up in our workflows, and it feels like Codex could benefit from a more standardized approach that’s easier to maintain and debug.
It seems like this was designed with a very isolated development environment in mind, but for those of us working with services like Postgres, Redis, and full-stack testing, it’s just not enough. We already have these processes defined in our CI pipelines.
4. Codex itself feels inconsistent
I’ve been really surprised by how difficult it is to get Codex to complete tasks reliably. While 4.1 in Windsurf is incredibly effective at following instructions, and o3 is powerful, Codex often falls short. It regularly ignores instructions or quits without completing tasks.
Codex needs to break down tasks into smaller steps, maintain to-do lists, and follow through on them. If it doesn’t complete a task, it should be aware enough to suggest starting a new job or making adjustments. It should also have the humility to acknowledge when it’s missed something and offer alternative solutions.
I really want to love Codex, but right now it feels like it’s far from being ready for professional use. It’s too complex for newcomers to get set up easily, but also lacks the robustness to integrate into a professional workflow. I’m hopeful that with more refinement, it will get there, as the underlying models (like 4.1) show so much potential. This truly feels like the future of software development, and I’m excited to see where it goes.
Peter