Suggestion: proactive handoff and rescue workflow for Codex compaction failures

I wanted to share a recovery workflow we hit today in Codex Desktop, because I think it points to a useful product enhancement for long-running project threads.

We had a very valuable Codex project thread become unusable after remote compaction started failing. Even a tiny prompt like “are you there?” triggered:

{
  "error": {
    "message": "Your input exceeds the context window of this model. Please adjust your input and try again.",
    "type": "invalid_request_error",
    "param": "input",
    "code": "context_length_exceeded"
  }
}

The hard part was that once the thread reached this state, it could not produce its own handoff summary. That is exactly when the user most needs one.

What appeared to cause the issue:

  • The thread was long-running and project-heavy.
  • It included many pasted screenshots/images over time.
  • Older embedded image payloads were still present in the local history even though they were no longer useful.
  • The compaction request itself appears to have become too large to fit through the context window.

The recovery pattern that worked:

  1. Opened a second helper Codex thread.
  2. Used it to inspect the local session history for the stuck thread.
  3. Created a durable handoff summary as a fallback.
  4. Backed up the original rollout/session file.
  5. Replaced embedded image payloads with lightweight placeholders while preserving the message/tool structure.
  6. Fully closed Codex so it would not rewrite cached state.
  7. Replaced the original rollout path in place with the repaired image-stripped version.
  8. Restarted Codex and reopened the original thread.

After that, the original thread responded again and appeared intact.

A few observed numbers from the case:

  • Local rollout reduced from about 611 MB to about 52.7 MB.
  • Embedded data:image payloads reduced to zero.
  • JSON parse errors after repair: zero.
  • The thread became usable again after restart.

Feature ideas this suggests:

  1. Codex could proactively detect when a long-running thread is approaching compaction failure risk and automatically write a durable handoff file before the thread becomes unusable.
  2. Codex could expose a built-in “rescue session” workflow for stuck threads, especially one that strips stale images or oversized tool payloads while preserving text context.
  3. When remote compaction fails because the compaction request itself is too large, Codex could explain that clearly and offer recovery paths: create handoff, strip media payloads, fork a repaired thread, or archive large assets.
  4. A helper-thread repair workflow could become an official pattern: one Codex thread helps summarize, reduce, or repair another under user control.

I also posted the more technical version as a GitHub issue here:

The main reason I think this matters: the most valuable Codex threads are often the ones most likely to be long, media-heavy, and hard to replace. A graceful continuity/export/rescue path would prevent users from losing the exact threads they care about most.