I really want to like Codex and obviously you guys are pouring a tonne of capital resources into it. But the reality is that using it as a professional is very hard.
1. What is going on with PRs and naming?
Obviously naming isn’t OpenAI’s forte and in models it’s kind of funny. But in branches it’s not. Branches are created with arbitrary names, it’s super unclear at any given point whether you’re opening a new branch or updating an existing one. Sometimes you’re forced to switch to a new branch half way through.
Writing code is fast but following up what’s going on on github is a nightmare.
2. If you use GitHub then that’s where you want to interface with Codex
I guess someone decreed that you should be able to use Codex from mobile. Which is great. And it’s nice that I can get a job going late at night from my phone and wake up to it delivered in the morning.
But which branch or PR that job was delivered to is a mystery. There’s not often an explanation as to what was done or why. And since Codex is at best lazy and at worst completely ignores the job it was supposed to do, the first thing you need to tell it in the morning may be to do the job it was drafted to do the previous night.
The UX to do code reviews, to suggest changes, to see diffs between code reviews and suggest follow ups already exists on Github.
Anthropic has the right idea. I prefer coding locally with your models. I REALLY like coding with 4.1 and I love the idea of free usage of o3 remotely but the reality is that both the codex model and the tooling around it feel way inferior to coding locally with Windsurf and 4.1 (even though Codex is supposed to be using a larger model).
3. Why does Codex reinvent the environment setup file?
The Codex setup file doesn’t live in your filesystem; it’s copied and pasted into Codex so there’s no source control. You can’t view the changes easily at Github and when you open multiple branches it’s anyone’s guess which one is which. You can’t test your setup file so if it breaks then you have to do the whole thing over again (and did I mention it’s not in source control?). And it’s entirely proprietary.
Many of us have already defined such environment files in CI scripts and docker files. Why ask for it to be done again inside your own environment? It’s requires more time and is another decoupled source of maintenance.
It feels like this was all developed by developers who don’t need to integrate with other services or manage their own CI. But if you have postgres and redis and queues and frontend builds and full stack tests then this is hard.
4. Codex is often lazy
It is SO HARD to get Codex to actually do the work that you ask it to do. I find this astonishing because 4.1 in Windsurf is incredibly good at following instructions. And o3 is very powerful. And yet Codex feels neither. It frequently totally ignores instructions. Or it decides the job is too long and quits without doing any of it.
Codex needs to break tasks down, to create todo lists, to follow those through and when it doesn’t complete them to suggest spinning up another job. It needs to have self awareness to know whether it’s actually done the work involved. And the humility to apologise and offer up suggestions of change when it hasn’t.
I really really want to like Codex but right now it feels very rough
But I hope you get it right as you have some amazing models (4.1 is really exceptional) and this is clearly the future of software development.
Peter