What on earth is the story with Codex and Github?

Hi, I just wanted to confirm what you found out :

  • Codex and PR = nightmare
  • Codex is super lazy

I find that both Openai and Anthropic vastly “embellish” reality. Their tech surely are great - they allow the public to have access to incredible models - but their marketing is borderline malpractice.

Regarding Claude Code, I was really angry, because they implied that the 100$ / 200$ models are usable in real productions, which won’t be the case until the context length is at least 500k. For the moment, you need so much prompt engineering that it is just easier to use a normal LLM. Gemini 2.5 with its 1M tokens context length is a fantastic experience for coding : it doesn’t need to launch 50 tools to search files and read them partially, it just reads what is in the context; though it ends up having problems with attention.

Anthropic toned tone a little bit their marketing claims, and they allowed Claude Code in their 20$ plan, roughly at the same time than Openai did it with Codex and Plus users, so … we are left with :

  • one of the agents who easily goes on an adventure when asked for a feature, is great at planning and understanding the big picture, but forgets everything after 2 autocompacts while victoriously hallucinating that “perfect ! everything is implemented”
  • and another one who makes 3 PR’s for every semicolon changed, and does half steps even if we insist on climbing a flight of stairs. I ended up just using the patch feature (applying the modifications on a local EDI through a copy-paste patch).

There is still a long way to go for those agents to be usable like they advertise, in real productions. The “cash grab scam” period of Anthropic really damaged my confidence in that enterprise, but to be honest, Openai also had Codex exclusively on their 200$ plan for a few weeks.

I’ll just wait for those techs to improve significantly before being a cash cow again.

1 Like