Potential destructive command mis-parsing on Windows: agent cleanup via cmd /c may delete workspace content instead of target folder

Summary

During an automated cleanup step executed by a Codex agent on Windows, a command intended to delete a small temporary folder appears to have recursively deleted the entire workspace contents.

The most plausible explanation is that the final argument to rmdir was mis-parsed through a multi-shell chain:

Tool → PowerShell → cmd /c → rmdir

This may cause the effective delete target to become . or the workspace root when the argument is lost or misinterpreted.

The command timed out after ~14 seconds, which strongly suggests a recursive delete over a large directory tree rather than the intended small test directory.


Environment

  • OS: Windows 10

  • Shell stack:

    • tool invocation

    • PowerShell

    • cmd /c

    • rmdir

  • Workspace location example:

E:\...\data_scrubbing
  • Codex agent mode executing shell commands

Intended operation

Delete a temporary test directory:

qa,comma

Expected command behavior:

rmdir /s /q "qa,comma"

Expected result:

data_scrubbing\qa,comma removed

Actual command executed

The agent executed:

cmd /c "rmdir /s /q \"qa,comma\""

This command passed through:

Tool
PowerShell
cmd /c
rmdir

Observed behavior

After the command ran:

  • the process ran for ~14 seconds

  • the command timed out

  • the workspace directory still existed

  • almost all workspace contents were gone

  • a second delete attempt returned:

The system cannot find the file specified

Why this likely indicates a parsing bug

A small test directory deletion should complete almost instantly.

The 14 second runtime strongly suggests the command executed a recursive delete across a large directory tree.

The most consistent explanation is that the effective delete target became:

.

or another broader path such as the workspace root.

This would produce exactly the observed state:

workspace\
    (empty)

because Windows cannot remove the current working directory itself but can remove its contents.


Risk

This class of failure is extremely dangerous because:

  • a benign cleanup step can wipe the workspace

  • the user sees a safe-looking command

  • the actual executed target may differ due to shell parsing

Similar destructive cleanup issues have been reported in Codex agents previously.


Reproduction (hypothesis)

Possible trigger:

cmd /c "rmdir /s /q \"target\""

when passed through:

PowerShell → cmd → program

If the final parameter is dropped or misparsed, rmdir may execute with an empty or default path.


Suggested mitigations

Codex agent should implement safeguards before executing destructive filesystem commands:

  1. Resolve the final path before execution

  2. Log the fully resolved absolute path

  3. Reject deletes if target equals:

    • workspace root

    • parent of workspace

    • drive root

  4. Avoid shell-string deletes when possible

  5. Prefer direct PowerShell cmdlets such as:

Remove-Item -LiteralPath <ABSOLUTE_PATH> -Recurse -Force

Impact

This behavior can lead to complete project data loss during automated cleanup tasks.

Adding path resolution and root protection would significantly reduce the risk.

Suggested training / policy improvements for Codex

This incident highlights a class of failure that should not rely on prompt-level safeguards.

The following safety rules should be incorporated into Codex agent behavior and training, not only recommended through prompts or AGENTS.md instructions.

Destructive filesystem operations

Codex should apply the following rules automatically:

  • Never perform destructive filesystem operations through shell command strings.

  • Avoid multi-shell chains such as:

Tool → PowerShell → cmd → program
  • Prefer direct API-level operations or native shell commands (e.g. PowerShell cmdlets).

Recursive delete guardrails

Before executing any recursive delete (rm -rf, rmdir /s, etc.), the agent should:

  1. Resolve the final canonical absolute path

  2. Log the resolved path

  3. Verify the path is:

    • not the workspace root

    • not the workspace parent

    • not a drive root

    • not the current working directory

  4. Abort if validation fails


Argument validation

If a destructive command resolves to:

.
..
(empty path)
*

or any equivalent broad target, the command should be rejected automatically.


Timeout handling

If a destructive command runs longer than expected:

  • stop immediately

  • perform no additional cleanup

  • report the current filesystem state

Timeout during a recursive delete should be treated as a potential destructive incident, not a recoverable state.


Why this should be part of training

Prompt-based safeguards are fragile.

Codex agents should treat destructive filesystem operations as high-risk actions requiring built-in guardrails, similar to:

  • Git history rewrites

  • credential access

  • network exfiltration

Embedding these rules in the agent’s default behavior would significantly reduce the risk of accidental repository deletion.

2 Likes

Temporary mitigation for users (until a fix is implemented)

## Cleanup Safety

- Never use shell-wrapped delete commands such as `cmd /c` for destructive filesystem operations.
- Resolve and normalize the path before validation and deletion.
- Delete only approved temporary paths inside the workspace, never arbitrary paths.
- Use only canonical absolute paths with `Remove-Item -LiteralPath ... -Recurse -Force -ErrorAction Stop`.
- Before recursive deletion, log and validate the final resolved path.
- Reject deletion if the target is empty, `.`, `..`, the workspace root, its parent, outside the approved temp area, or contains symlinks/junctions/reparse points.
- On any unexpected delete behavior or timeout: stop, do not retry, inspect, and report.
2 Likes

Thanks for the detailed write-up and for flagging this.

From what you described, it sounds like the cleanup command may have ended up deleting far more than the intended temporary folder, which is obviously concerning behavior. Reports like this are important, especially when they involve potentially destructive filesystem actions.

We’ve seen similar reports in the another thread from other users, so we’re going to escalate this for investigation so the relevant team can take a closer look at the behavior and the command execution path involved.

Really appreciate you documenting the environment, command chain, and possible causes so clearly, that kind of detail helps a lot when investigating issues like this. If you happen to have any additional logs or reproducible steps, feel free to share them in the thread as well. Thanks again for bringing this to attention.

– Taylor

3 Likes