Why does Codex repeat the same mistakes?

These are good points, and I think this is the right direction to analyze the problem.

I asked Codex to inspect the repo instead of guessing, and the useful finding is this: the constraint is already documented in multiple places, including agent guidance and deterministic-rail related docs. So I do not think the issue is simply “there is no documentation” or “the agent was never told.”

However, the audit also found a real gap: there are targeted guards, but not one universal production-code scanner that blocks every possible static user-facing template, keyword-triggered reply path, or post-generation deterministic reply rail.

So I agree with the test/invariant point. In an AI-assisted workflow, tests should not only verify normal behavior. They should also protect architectural invariants. In this case, the invariant should be something like: user-facing behavior must remain model-backed through the canonical runtime path, except for explicitly justified narrow exceptions.

I also agree with the point about instruction semantics. Repeating “do not implement X” may keep X active in the model’s context, and the model may later reintroduce it under another name such as fallback handling, guard blocks, or deterministic blocks. So I probably need to express the rule as a positive invariant rather than repeatedly describing the forbidden pattern.

My current takeaway is:

  1. create a small canonical “user-facing reply ownership” invariant doc;

  2. reference it directly from AGENTS.md;

  3. add CI checks/scanners for hardcoded user-facing reply paths;

  4. maintain an allowlist for narrow deterministic exceptions;

  5. require explicit justification labels/capability IDs for any deterministic rail.

That said, my reliability concern remains. The frustrating part is that Codex often acknowledges the constraint correctly in conversation, then violates the same architectural boundary again during implementation. So yes, I can improve the harness and invariant checks, but I still think this is a real constraint-preservation failure mode in larger agent workflows.