Hi everyone,
I wanted to share a recent project that I’ve just finished getting Codex to action. Frankly, it redefined what I thought was possible with an AI engineering assistant.
Over the past few months, I’ve been using Codex to execute a full front-end migration of OpenStack. I took a forked, disconnected clone of what is arguably one of the largest Python-based open-source projects in existence. A public repo with over a decade of drift, bloat and technical debt and I pointed Codex straight at it.
The Challenge
Take multiple hybrid React/Angular/NPM front-end UIs and plugins and replace them entirely with a single, streamlined Python-only Streamlit console that integrates directly with a FastAPI sidecar and a modern CI/CD toolchain.
What Codex Achieved
In just under two weeks, Codex:
-
Re-architected the entire UI layer into six clean functional consoles:
01_Provision→ Infrastructure bootstrap
02_Secure→ IAM, key and policy management
03_Observe→ Metrics, logging and alerts
04_Orchestrate→ Workflow execution
05_Recover→ Rollback and snapshot management
06_Assets→ Resource and cost overview -
Removed all Node/NPM dependencies, replacing them with a deterministic, Python-only runtime, shaving over 250k LOC from the repo.
-
Implemented a FastAPI sidecar, complete with request validation, CORS, and async-safe endpoints.
-
Applied strict hygiene and security controls:
-
ruff check→ Zero warnings -
mypy --strict→ 100 % typing coverage -
bandit -q -r src→ Clean security audit -
semgrep --config auto→ No policy violations -
pytest -q→ Full test suite passed
-
How It Worked
Codex wasn’t just following prompts, it behaved like an autonomous engineer inside a CI pipeline.
Each prompt was structured like a development ticket:
-
Scope: File paths, module purpose, expected outputs
-
Constraints: No dependency drift, maintain test isolation, preserve interface contracts
-
Tests to pass: Targeted
pytestpaths with--maxfail=1 -
Acceptance criteria: CI tools green and zero static-analysis failures
Codex ran full loops:
-
Generated the code
-
Executed
pytest -
Analysed tracebacks
-
Applied deterministic fixes
-
Re-ran tests until clean
The process was recursive, not reactive.
By the time Codex submitted a pull request, it had already passed the equivalent of a senior engineer’s pre-merge review and cleared all GitHub automated CI gates.
Code Quality and Architecture
The code it produced wasn’t just functional, the coding was elegant:
-
Fully typed functions, clear docstrings and consistent async usage.
-
No orphan imports, no circular dependencies and no duplicated logic.
-
Cohesive internal structure with a capsule pattern for modular isolation.
-
Sensible naming conventions, strong abstraction boundaries and zero technical-debt injection.
-
Readable and maintainable — if you showed the output to a human dev team, they’d assume it was the result of weeks of careful refactoring.
Codex produced code that simultaneously met a heavy-duty quality baseline:
pdm-managed reproducibility, strict Python-only hygiene (no Node or webpack), deterministic CI gating (ruff, black, mypy --strict, pytest, bandit, semgrep, import-linter), ≥ 95 % test coverage, full governance documentation (CONFORMANCE.md, HOWTO.local.md), signed SBOM evidence, and verified NPM-free integrity.
All code was validated autonomously by Codex with zero manual intervention and traced back to documentation. The resulting codebase cleanly passed all automated and manually executed CI checks — the exact opposite of what people call “AI slop”.
The Broader Implication
What impressed me most wasn’t speed, it was semantic comprehension.
Codex didn’t just manipulate files; it understood architectural intent. It could reason across modules, preserve functional boundaries, and refactor while maintaining dependency integrity.
This was the first time I’ve seen an AI system:
-
Absorb an enterprise-scale codebase
-
Execute refactors at human-engineering quality
-
Validate them autonomously through CI
-
And self-correct without intervention
That’s not a co-pilot, that’s a senior engineer operating inside the repo.
Why Share This
I wanted to share this because it shows the positive extreme of what’s possible when Codex is used in structured, disciplined workflows.
It’s not “AI coding snippets”, it’s AI-led systems engineering that’s reproducible, testable and CI-aligned.
Would love to hear if others have pushed Codex into similar territory. I’d love to hear how Codex has tackled large-scale migrations, complex refactors, or multi-repo orchestration. I suspect this is where the real frontier of agentic development lies.