Built a red-team skill for Codex that tests prompt injection, MCP poisoning, and concealed agent actions

I built HackYourAgent, a public skill bundle for red-teaming coding-agent workflows.

The target use case is narrow on purpose: if you are building or shipping agents with Codex, this is meant to be the adversarial pass you run before trusting the workflow.

What it tests:

  • prompt injection
  • MCP/tool poisoning
  • memory poisoning
  • approval confusion
  • concealed side effects

What it actually does:
it maps trust boundaries in a repo or staging agent, creates paired control + attack trials, inspects outputs one by one, and saves findings/evidence/regressions under redteam/.

I focused on Codex-native usage instead of building a separate heavyweight eval stack, so the repo includes:

  • a native Codex skill wrapper
  • a self-contained installer
  • seeded vulnerable example targets for RAG injection, MCP poisoning, and concealment

Repo:

What I’d most like feedback on from Codex users:
does the control-vs-attack forensic workflow feel like the right shape for a Codex skill, or is it still too heavy for normal repo use?