Summary
During an automated cleanup step executed by a Codex agent on Windows, a command intended to delete a small temporary folder appears to have recursively deleted the entire workspace contents.
The most plausible explanation is that the final argument to rmdir was mis-parsed through a multi-shell chain:
Tool → PowerShell → cmd /c → rmdir
This may cause the effective delete target to become . or the workspace root when the argument is lost or misinterpreted.
The command timed out after ~14 seconds, which strongly suggests a recursive delete over a large directory tree rather than the intended small test directory.
Environment
-
OS: Windows 10
-
Shell stack:
-
tool invocation
-
PowerShell
-
cmd /c -
rmdir
-
-
Workspace location example:
E:\...\data_scrubbing
- Codex agent mode executing shell commands
Intended operation
Delete a temporary test directory:
qa,comma
Expected command behavior:
rmdir /s /q "qa,comma"
Expected result:
data_scrubbing\qa,comma removed
Actual command executed
The agent executed:
cmd /c "rmdir /s /q \"qa,comma\""
This command passed through:
Tool
PowerShell
cmd /c
rmdir
Observed behavior
After the command ran:
-
the process ran for ~14 seconds
-
the command timed out
-
the workspace directory still existed
-
almost all workspace contents were gone
-
a second delete attempt returned:
The system cannot find the file specified
Why this likely indicates a parsing bug
A small test directory deletion should complete almost instantly.
The 14 second runtime strongly suggests the command executed a recursive delete across a large directory tree.
The most consistent explanation is that the effective delete target became:
.
or another broader path such as the workspace root.
This would produce exactly the observed state:
workspace\
(empty)
because Windows cannot remove the current working directory itself but can remove its contents.
Risk
This class of failure is extremely dangerous because:
-
a benign cleanup step can wipe the workspace
-
the user sees a safe-looking command
-
the actual executed target may differ due to shell parsing
Similar destructive cleanup issues have been reported in Codex agents previously.
Reproduction (hypothesis)
Possible trigger:
cmd /c "rmdir /s /q \"target\""
when passed through:
PowerShell → cmd → program
If the final parameter is dropped or misparsed, rmdir may execute with an empty or default path.
Suggested mitigations
Codex agent should implement safeguards before executing destructive filesystem commands:
-
Resolve the final path before execution
-
Log the fully resolved absolute path
-
Reject deletes if target equals:
-
workspace root
-
parent of workspace
-
drive root
-
-
Avoid shell-string deletes when possible
-
Prefer direct PowerShell cmdlets such as:
Remove-Item -LiteralPath <ABSOLUTE_PATH> -Recurse -Force
Impact
This behavior can lead to complete project data loss during automated cleanup tasks.
Adding path resolution and root protection would significantly reduce the risk.
Suggested training / policy improvements for Codex
This incident highlights a class of failure that should not rely on prompt-level safeguards.
The following safety rules should be incorporated into Codex agent behavior and training, not only recommended through prompts or AGENTS.md instructions.
Destructive filesystem operations
Codex should apply the following rules automatically:
-
Never perform destructive filesystem operations through shell command strings.
-
Avoid multi-shell chains such as:
Tool → PowerShell → cmd → program
- Prefer direct API-level operations or native shell commands (e.g. PowerShell cmdlets).
Recursive delete guardrails
Before executing any recursive delete (rm -rf, rmdir /s, etc.), the agent should:
-
Resolve the final canonical absolute path
-
Log the resolved path
-
Verify the path is:
-
not the workspace root
-
not the workspace parent
-
not a drive root
-
not the current working directory
-
-
Abort if validation fails
Argument validation
If a destructive command resolves to:
.
..
(empty path)
*
or any equivalent broad target, the command should be rejected automatically.
Timeout handling
If a destructive command runs longer than expected:
-
stop immediately
-
perform no additional cleanup
-
report the current filesystem state
Timeout during a recursive delete should be treated as a potential destructive incident, not a recoverable state.
Why this should be part of training
Prompt-based safeguards are fragile.
Codex agents should treat destructive filesystem operations as high-risk actions requiring built-in guardrails, similar to:
-
Git history rewrites
-
credential access
-
network exfiltration
Embedding these rules in the agent’s default behavior would significantly reduce the risk of accidental repository deletion.