I’m stuck trying to diagnose a Codex Cloud failure and I’m not even sure what knobs I can tune.
Error:
the job was killed during the <step> step (likely due to resource limits)
What’s happening:
Any time I launch this task in Codex Cloud, from either WSL + VS Code or chatgpt. com/codex, it dies early with the message above. I can’t find any documentation on what the actual resource limits are (CPU, RAM, disk, runtime, GPU VRAM, etc.).
Local behavior:
Running the same task locally works fine and takes between 3 minutes to 1.5 hours depending on the size of the input data file. My local machine is an AMD Ryzen 9 7950X3D + RTX 5090, so it’s totally possible I’m exceeding Cloud limits without realizing it.
What I need clarity on:
I don’t mind waiting longer for extra resources to be allocated, if that’s even an option, or if there’s a way to request a larger quota for a single task. I also don’t know the expected max runtime for a job before Cloud kills, it but the recent announcements made it sound like tasks have been run for up to 24 hours.
What I’m trying to figure out:
-
What are the actual Codex Cloud resource limits?
-
Does “resource limits” mean RAM, CPU time, disk, GPU, runtime, or something else?
-
Is there a way to request more resources or a longer allowed runtime?
-
Any guidance on how to profile which part of my task is exceeding whatever limit exists?
Right now the job just dies and gives me no insight into what’s hitting the ceiling, so I don’t know what to optimize or scale back. Any direction would help.
Thanks!