Let’s build a community-driven collection of practical tips to maximize productivity and efficiency when using OpenAI’s Codex. Share your favorite insights, and let’s improve our skills together!
Initial tips:
Choosing the Right Model:
Stick to GPT-5.2-Codex medium/high for regular tasks; reserve xhigh reasoning for especially challenging tasks due to higher cost and slower processing.
Better Documentation:
Equip your agents with high-quality local documentation instead of relying on web scraping.
Learning from Each Other:
Follow community members who share valuable workflows and insights. Maybe you’re one of them?
I will update this initial list with the high value contributions from the community. This is will be a living document for the time being!
It is also important to emphasize that 100% test coverage of the core code is essential. Yes, it is a significant amount of work, and yes, it can feel like you are writing tests for situations that “should never happen.”
However, the current mindset of many programmers still assumes they are programming only with other humans. When AI is involved, that assumption breaks down. An AI often has visibility into the entire codebase and will actively use any available paths to satisfy the prompt—even if those paths lead to behavior the user never intended.
Achieving 100% test coverage helps keep the AI in check by clearly defining what is allowed, what is forbidden, and what must never occur. In that sense, exhaustive tests are no longer just about catching bugs—they are a critical part of constraining and guiding AI behavior.
Another effective way to leverage tests is to define clear criteria for when a task is considered done. By starting with a test that specifies the expected behavior, the model can iterate until the implementation satisfies it.
In my view, this is one of the strongest features available right now. It lets us be fairly confident that the result will work, which is particularly valuable when building prototypes.
im testing an AGENTS.MD edit that explicitly says that agents should not read docs unless explicitly needed, this helps the ai not rot the context in early prompting stages, your .md files for agents should be strong tho, im still testing the outputs, let me know if you find success! (i know about gsd and how fights context rot, im trying to not rush some features in my projects with parallelization)
How are you evaluating the results? Is it mostly based on intuition, or are you running any quantifiable checks?
Otherwise, this approach sounds very reasonable: faster, cheaper or using fewer tokens from the quota, and with a lower risk of losing relevant information during compaction.