I built HackYourAgent, a public skill bundle for red-teaming coding-agent workflows.
The target use case is narrow on purpose: if you are building or shipping agents with Codex, this is meant to be the adversarial pass you run before trusting the workflow.
What it tests:
- prompt injection
- MCP/tool poisoning
- memory poisoning
- approval confusion
- concealed side effects
What it actually does:
it maps trust boundaries in a repo or staging agent, creates paired control + attack trials, inspects outputs one by one, and saves findings/evidence/regressions under redteam/.
I focused on Codex-native usage instead of building a separate heavyweight eval stack, so the repo includes:
- a native Codex skill wrapper
- a self-contained installer
- seeded vulnerable example targets for RAG injection, MCP poisoning, and concealment
Repo:
What Iād most like feedback on from Codex users:
does the control-vs-attack forensic workflow feel like the right shape for a Codex skill, or is it still too heavy for normal repo use?