Codex Cloud: Persistent Session Failures, Container Errors & 502 Infrastructure Issues

EthicalAIExplorer · December 11, 2025, 12:30pm

Hi all,

I’m encountering a series of repeated failures when working with Codex Cloud sessions, and it appears to be an infrastructure-level issue rather than something in my code or workflow. I wanted to document the behaviour here in case others are experiencing the same thing—or if the OpenAI team needs a reproducible report.

What I’m Seeing

Across multiple attempts to run or resume Codex Cloud sessions, I receive variations of the following:

“Error: Failed to read output from session ‘shell’. This session may be corrupt. Please start a new session.”
Attempts to reopen the session result in Codex thinking the session might still be alive, then failing to connect.
Starting a new session also fails intermittently.
Codex repeatedly speculates about session limits, container-tool recovery, or infrastructure availability, but every retry leads to the same outcomes.
Heartbeat checks and feed_chars attempts don’t revive the session.
Switching to a different session name (e.g., “session1”) fails as well.
The system sometimes reports 502 CAAS errors, blocking any container interactions.
When trying fallback strategies (minimal scripts, subprocess checks, specifying ports, waiting between retries), the environment consistently fails to start.

Observed Pattern

Sessions appear to become corrupt or unreachable.
New sessions fail to initialise.
The container environment intermittently reports infrastructure errors.
No code changes can be made because the tool never becomes available.
Codex itself ultimately concludes it cannot continue due to environment failure.

Impact

No commits or changes can be pushed.
Tests cannot be run.
Any operation requiring a container session effectively stalls.

Can someone from the OpenAI team confirm whether:

There is a known outage or degradation affecting Codex Cloud sessions or CAAS infrastructure?
There are new session limits or behavioural changes that we should be aware of?
There are recommended recovery steps beyond the standard “start a new session” flow?

Happy to provide timestamps or additional logs if helpful.

Thanks in advance—keen to resume work once the environment settles.

Topic		Replies	Views
Codex becomes unstable and constantly disconnects after “2x capacity until May 30” rollout despite only 50% usage Codex	6	721	May 19, 2026
Codex Terminal Persistent Failure - DNS Resolution Issue with oaiusercontent.com Domain (Months-Long Problem) Codex codex	0	671	September 22, 2025
Wfr_ errors in Responses API background mode + correlation with code_interpreter-heavy workflows? Bugs	0	46	April 24, 2026
Interactive terminal for Codex environment fails to start Codex codex , bug	28	2095	November 13, 2025
Codex cloud - Always fails to create or update PR Codex codex	2	476	March 16, 2026

Codex Cloud: Persistent Session Failures, Container Errors & 502 Infrastructure Issues

What I’m Seeing

Observed Pattern

Impact

Related topics