Why does Codex repeat the same mistakes?

Rotto · June 13, 2026, 9:58pm

These are good points, and I think this is the right direction to analyze the problem.

I asked Codex to inspect the repo instead of guessing, and the useful finding is this: the constraint is already documented in multiple places, including agent guidance and deterministic-rail related docs. So I do not think the issue is simply “there is no documentation” or “the agent was never told.”

However, the audit also found a real gap: there are targeted guards, but not one universal production-code scanner that blocks every possible static user-facing template, keyword-triggered reply path, or post-generation deterministic reply rail.

So I agree with the test/invariant point. In an AI-assisted workflow, tests should not only verify normal behavior. They should also protect architectural invariants. In this case, the invariant should be something like: user-facing behavior must remain model-backed through the canonical runtime path, except for explicitly justified narrow exceptions.

I also agree with the point about instruction semantics. Repeating “do not implement X” may keep X active in the model’s context, and the model may later reintroduce it under another name such as fallback handling, guard blocks, or deterministic blocks. So I probably need to express the rule as a positive invariant rather than repeatedly describing the forbidden pattern.

My current takeaway is:

create a small canonical “user-facing reply ownership” invariant doc;
reference it directly from AGENTS.md;
add CI checks/scanners for hardcoded user-facing reply paths;
maintain an allowlist for narrow deterministic exceptions;
require explicit justification labels/capability IDs for any deterministic rail.

That said, my reliability concern remains. The frustrating part is that Codex often acknowledges the constraint correctly in conversation, then violates the same architectural boundary again during implementation. So yes, I can improve the harness and invariant checks, but I still think this is a real constraint-preservation failure mode in larger agent workflows.

Topic		Replies	Views
When vibe coding turns into an unfixable mess Codex	21	1582	March 11, 2026
Perceived Drop in GPT-5 Quality Over the Last Few Weeks Codex gpt-5-codex , gpt-5-5	32	1398	June 3, 2026
Can Codex Low match Medium on easy tasks? Codex gpt-5-codex	6	713	February 2, 2026
How to stop Codex from rushing fixes? Codex	51	851	June 14, 2026
Are GPT writers a waste of time? GPT builders	17	2149	December 11, 2024

Why does Codex repeat the same mistakes?

Related topics