Aardvark: OpenAI’s agentic security researcher

What it is:
Aardvark is a GPT-5–powered autonomous agent that acts like a human security researcher. It continuously scans codebases to find, validate, and help fix vulnerabilities. It’s in private beta.

How it works (pipeline):

  • Analysis: Builds a repo-specific threat model by reading the entire codebase.

  • Commit scanning: Monitors new commits and also back-scans history; explains suspected vulns with annotated code.

  • Validation: Reproduces issues in a sandbox to confirm exploitability and cut false positives.

  • Patching: Proposes one-click patches via Codex; fixes are attached to each finding for human review.
    It integrates with GitHub and existing workflows; uses LLM reasoning and tool use rather than traditional fuzzing/SCA.

Impact/Results so far:

  • Running for months on OpenAI’s internal repos and external alpha partners; surfaced meaningful, hard-to-trigger issues.

  • On “golden” test repos, detected 92% of known/synthetic vulnerabilities (high recall).

Open source stance:

  • Already found and responsibly disclosed multiple OSS vulnerabilities; ten received CVE IDs.

  • Plans pro-bono scanning for select non-commercial OSS projects.

  • Updated outbound disclosure policy to prioritize collaboration over rigid timelines.

Why it matters:

  • Software vulns are systemic risk (40k+ CVEs in 2024; ~1.2% of commits introduce bugs).

  • “Defender-first” model: continuous, validated protection with actionable fixes—without slowing development.

Availability:

  • Private beta open to select partners and OSS projects (apply to join).

Contributors listed: Akshay Bhat, Andy Nguyen, Dave Aitel, Harold Nguyen, Ian Brelinsky, Tiffany Citra, Xin Hu, Matt Knight.

https://openai.com/index/introducing-aardvark/

6 Likes