What on earth is the story with Codex and Github?

I really want to like Codex, and I can see that a lot of resources have been invested into it. However, from a professional’s perspective, the experience is quite frustrating. I understand how much work has gone into this, but with the resources available to OpenAI, it still feels lacking in several key areas.

1. What’s going on with PRs and naming?

Naming isn’t usually a strong suit for models, and that can be entertaining in some contexts. But when it comes to branches, it’s less amusing. The naming and management of branches in Codex feel arbitrary at times. It’s often unclear whether you’re creating a new branch or updating an existing one, and sometimes you’re even forced to switch branches halfway through. While coding is quick, following the progress on GitHub can be really confusing.

2. The interface to Codex (for professionals) should be on Github

I get that Codex is accessible from mobile devices, and that’s a nice feature, especially for late-night tasks. But it often feels like a mystery where the job ends up. Viewing file diffs is not intuitive, and there’s often little explanation of the changes that were made or why. And since Codex can sometimes be slow or unresponsive, you often need to prompt it to complete the task it was initially set to do.

For professional environments, it’s clear that GitHub provides the best UX for collaboration, and that’s where Codex should be integrated. GitHub has spent years perfecting its UX for both humans and bots, and that’s where Codex would be best used.

I appreciate the direction Anthropic is going with this, and I enjoy coding locally with 4.1 and using o3 remotely. But right now, Codex’s experience feels a bit fragmented and unpolished.

3. The environment should be docker-compose (or at least a standard CI actions setup)

The Codex setup file is quite detached from your local filesystem. It’s pasted into Codex, which means there’s no easy way to manage it in source control. And when you open multiple branches, it’s hard to know which one is which. If the setup file breaks, you essentially have to start over.

Instead of asking users to define professional-grade code in a proprietary setup file, it would be far more efficient to integrate with existing CI scripts and Docker files. Many of us have already set these up in our workflows, and it feels like Codex could benefit from a more standardized approach that’s easier to maintain and debug.

It seems like this was designed with a very isolated development environment in mind, but for those of us working with services like Postgres, Redis, and full-stack testing, it’s just not enough. We already have these processes defined in our CI pipelines.

4. Codex itself feels inconsistent

I’ve been really surprised by how difficult it is to get Codex to complete tasks reliably. While 4.1 in Windsurf is incredibly effective at following instructions, and o3 is powerful, Codex often falls short. It regularly ignores instructions or quits without completing tasks.

Codex needs to break down tasks into smaller steps, maintain to-do lists, and follow through on them. If it doesn’t complete a task, it should be aware enough to suggest starting a new job or making adjustments. It should also have the humility to acknowledge when it’s missed something and offer alternative solutions.

I really want to love Codex, but right now it feels like it’s far from being ready for professional use. It’s too complex for newcomers to get set up easily, but also lacks the robustness to integrate into a professional workflow. I’m hopeful that with more refinement, it will get there, as the underlying models (like 4.1) show so much potential. This truly feels like the future of software development, and I’m excited to see where it goes.

Peter

Here’s a case in point. I have three jobs open at Codex:

For some of these I have requested the codex task to refine its original task and do more work. Now ideally the result of that would be either the same PR updated with thew new code OR a new PR opened to pull the new batch of code into the original branch. And if there is a new PR I’d expect it to have naming that reflects the PR it’s being merged into.

Instead, these are the PRs that Codex opens up for me:

  • Document mail interface status
  • feat: add docs viewer
  • docs: expand angular docs

And it’s also created four (4!) new branches:

  1. d92ewq-codex/document-full-angular-project-angular-frontend
  2. codex/document-legacy-rails-email-interface
  3. yr5f5x-codex/document-legacy-rails-email-interface
  4. codex/create-documentation-viewer-and-compile-script

There is so little consistency. The original Codex requests do, just about, seem to match the branch names - although what do we think the matching is between branch 2 and branch 3? I assume that branch 3 is probably a refinement of branch 2 and meant to be merged into it but frankly I’ve no idea.

But can you tell with confidence which PR title matches which task issued at Codex? I can’t. The tasks are all subtly different and because the PRs have been arbitrarily named I can’t be sure which PR relates to which task.

And then finally I have to match up the PRs to the branches. Since neither the branch names tell me much nor do the PR titles I’m really starting to get stuck now.

I know you have dreams of people firing off a job a minute but I cannot comprehend in what world that might be happening. You can’t even follow what’s going on between three of them!

I’m off to try Claude Code instead. I’ve got a lot of loyalty to OpenAI but the current state of Codex really is a mess.