Let’s build a community-driven collection of practical tips to maximize productivity and efficiency when using OpenAI’s Codex. Share your favorite insights, and let’s improve our skills together!
Initial tips:
Choosing the Right Model:
Stick to GPT-5.2-Codex medium/high for regular tasks; reserve xhigh reasoning for especially challenging tasks due to higher cost and slower processing.
Better Documentation:
Equip your agents with high-quality local documentation instead of relying on web scraping.
Learning from Each Other:
Follow community members who share valuable workflows and insights. Maybe you’re one of them?
I will update this initial list with the high value contributions from the community. This is will be a living document for the time being!
It is also important to emphasize that 100% test coverage of the core code is essential. Yes, it is a significant amount of work, and yes, it can feel like you are writing tests for situations that “should never happen.”
However, the current mindset of many programmers still assumes they are programming only with other humans. When AI is involved, that assumption breaks down. An AI often has visibility into the entire codebase and will actively use any available paths to satisfy the prompt—even if those paths lead to behavior the user never intended.
Achieving 100% test coverage helps keep the AI in check by clearly defining what is allowed, what is forbidden, and what must never occur. In that sense, exhaustive tests are no longer just about catching bugs—they are a critical part of constraining and guiding AI behavior.
Another effective way to leverage tests is to define clear criteria for when a task is considered done. By starting with a test that specifies the expected behavior, the model can iterate until the implementation satisfies it.
In my view, this is one of the strongest features available right now. It lets us be fairly confident that the result will work, which is particularly valuable when building prototypes.
im testing an AGENTS.MD edit that explicitly says that agents should not read docs unless explicitly needed, this helps the ai not rot the context in early prompting stages, your .md files for agents should be strong tho, im still testing the outputs, let me know if you find success! (i know about gsd and how fights context rot, im trying to not rush some features in my projects with parallelization)
How are you evaluating the results? Is it mostly based on intuition, or are you running any quantifiable checks?
Otherwise, this approach sounds very reasonable: faster, cheaper or using fewer tokens from the quota, and with a lower risk of losing relevant information during compaction.
im mostly testing changes on a test suite i have for my software, i have successfully closed the loop and also with greptile reviews in my git repo when i do pr’s (thanks to th3o from t3)
While skills are designed to activate automatically, they can also be invoked on demand.
Why would you want that workflow?
For example:
A common need across many AI tools is to create a plan that is materialized as a Markdown document and saved locally in some temp directory with the exact location not reported. That document often needs to be stored/exported to a specific location with a specific name—for example, in a project’s plans/ directory, or referenced or attached to other related documents.
While reviewing Plan Mode issues today, this use case was noted by OpenAI staff.
@EricGT, circling back to the best practices discussion. You mentioned aiming for 100 percent code coverage. Do you mainly use this as a way to help Codex produce better and faster results, or do you see it as more general guidance for pairing with an AI while coding?
From my perspective, defining tests upfront to clearly signal when a task is complete is an excellent way to get higher quality outputs and ensure the solution actually works.
In one setup, the test results are produced externally, so I have the Codex CLI wait for a predefined amount of time before checking the logs. It works very well because it allows the loop to continue until a required result is achieved.
General guidance for pairing with an AI while coding is closer.
In some situations, I think of a strong AI coding model as a software developer that complements my experience. There are things I know that it does not, and things it knows that I do not. The more we collaborate—especially in pushing toward 100% test coverage of core code—the more both of us learn.
I think it’s a good approach to leverage tests for better results by building the validation loop at the very beginning of a task. Codex can then iterate over the results until the desired outcome is achieved. Designing the tests to be meaningful then becomes a necessary part of the planning phase.
One thing I am actively experimenting with and testing is something called “Tiger Style” programming which was created by the devs of the Tiger Beetle database. It’s a very strict and safe way to ensure that everything works and it is a technique I have never seen anyone else do in my over 11+ years as a software engineer.
I created a repo where I am actively trying to modify it for coding with AI, specifically Codex. You can find it here on my github. I can’t share links yet, so here is my attempt to show you where to find it.
github / robertguss/programming_tiger_style
Inside of the resources/ directory are the official tiger style development style docs and also the power of 10 ideas from NASA that they use for creating mission critical software that cannot have any bugs in it.
I hope this helps someone else and please let me know if you try and and give me feedback on it. This style is in addition to TDD which I have enforced and made mandatory as I get the best results from AI when it follows a strict red, green, refactor process.