Hi all,
I’m encountering a series of repeated failures when working with Codex Cloud sessions, and it appears to be an infrastructure-level issue rather than something in my code or workflow. I wanted to document the behaviour here in case others are experiencing the same thing—or if the OpenAI team needs a reproducible report.
What I’m Seeing
Across multiple attempts to run or resume Codex Cloud sessions, I receive variations of the following:
-
“Error: Failed to read output from session ‘shell’. This session may be corrupt. Please start a new session.”
-
Attempts to reopen the session result in Codex thinking the session might still be alive, then failing to connect.
-
Starting a new session also fails intermittently.
-
Codex repeatedly speculates about session limits, container-tool recovery, or infrastructure availability, but every retry leads to the same outcomes.
-
Heartbeat checks and
feed_charsattempts don’t revive the session. -
Switching to a different session name (e.g., “session1”) fails as well.
-
The system sometimes reports 502 CAAS errors, blocking any container interactions.
-
When trying fallback strategies (minimal scripts, subprocess checks, specifying ports, waiting between retries), the environment consistently fails to start.
Observed Pattern
-
Sessions appear to become corrupt or unreachable.
-
New sessions fail to initialise.
-
The container environment intermittently reports infrastructure errors.
-
No code changes can be made because the tool never becomes available.
-
Codex itself ultimately concludes it cannot continue due to environment failure.
Impact
-
No commits or changes can be pushed.
-
Tests cannot be run.
-
Any operation requiring a container session effectively stalls.
Can someone from the OpenAI team confirm whether:
-
There is a known outage or degradation affecting Codex Cloud sessions or CAAS infrastructure?
-
There are new session limits or behavioural changes that we should be aware of?
-
There are recommended recovery steps beyond the standard “start a new session” flow?
Happy to provide timestamps or additional logs if helpful.
Thanks in advance—keen to resume work once the environment settles.