GPT-5.3-Codex-Spark: fast model, but /goal state seems fragile after context compaction

EthicalAIExplorer · June 14, 2026, 8:11pm

I’ve been testing gpt-5.3-codex-spark in Codex on a real repository workflow, and I wanted to share an observation that may be useful for the Codex team or other users building longer-running /goal loops.

First: the model is blazingly quick. For short, bounded tasks it feels very strong. It moves fast, inspects quickly, and can get through small code/test iterations with very little friction.

The issue I’m seeing is with longer /goal workflows where the task depends on remembering already-validated state.

In one recent run, I gave it a goal that required:

inspect the repo state
make focused test-hardening changes
run lint/tests
validate the result
continue only if something remained unvalidated

It successfully ran a series of lint and test commands. Then, after a context compaction event, it appeared to lose track of the fact that those commands had already completed successfully.

After compaction, it resumed as though it still lacked a “validated test result”, even though it had already produced one earlier in the same /goal run. The practical result was a loop:

run tests
→ tests pass
→ context compacts
→ model no longer trusts / remembers the pass
→ rerun tests
→ still believes validation is missing
→ repeat

This is painful because the failure mode is not that the model cannot do the work. It can. The failure is that the /goal state does not seem robustly survive context compaction.

The model’s small/fast context profile seems to conflict with goal-oriented agentic workflows where the important state is not just “what files changed”, but also:

which validation commands have already run
exact pass/fail results
which findings were already resolved
which checks remain genuinely outstanding
whether the task is in implementation mode, validation mode, or landing mode

In other words, after compaction the model may remember the broad goal, but lose the execution ledger needed to avoid repeating itself.

The issue is structural - a fast Codex model can become expensive or counterproductive in long /goal workflows if compaction causes it to forget validated state and re-enter loops.

A possible mitigation might be for /goal runs to maintain a compact, durable task ledger outside the ordinary conversational context, something like:

Validated:

command A: passed at
command B: passed at
reviewer X: no blockers
git diff checked

Outstanding:

command C not yet run
PR body not yet generated

Mode:

validation / landing / blocker-fix / paused

Then after compaction, the model would have a reliable execution state rather than reconstructing progress from compressed prose.

Another possible user-side workaround is to explicitly force the model to maintain a VALIDATION_LEDGER.md or similar file during long goals, but it feels like this is a core /goal orchestration problem rather than something every user should have to hand-roll.

Has anyone else seen this with gpt-5.3-codex-spark or other smaller/faster Codex models? In particular, have you noticed post-compaction loops where the model repeats tests or validation because it no longer trusts the earlier result?

Topic		Replies	Views
Experimenting with Codex deciding its own next steps Codex CLI	5	342	May 14, 2026
Codex needs better persistence for long multi-step tasks Codex CLI codex	1	62	July 1, 2026
Suggestion: proactive handoff and rescue workflow for Codex compaction failures Codex	0	162	May 22, 2026
Codex “Stop” works, but task resumes old plan and ignores new instruction; model aliasing is opaque (gpt-5-codex) Codex CLI codex , bug	1	389	January 9, 2026
Introducing GPT-5.1-Codex-Max: Enhanced reasoning and long-horizon workflows Codex announcement , codex	9	1791	November 27, 2025

GPT-5.3-Codex-Spark: fast model, but /goal state seems fragile after context compaction

Related topics