Tips and Tricks for using Codex

Best Practices for Codex

Let’s build a community-driven collection of practical tips to maximize productivity and efficiency when using OpenAI’s Codex. Share your favorite insights, and let’s improve our skills together!

Initial tips:

  1. Choosing the Right Model:

    • Stick to GPT-5.2-Codex medium/high for regular tasks; reserve xhigh reasoning for especially challenging tasks due to higher cost and slower processing.
  2. Better Documentation:

    • Equip your agents with high-quality local documentation instead of relying on web scraping.
  3. Learning from Each Other:

    • Follow community members who share valuable workflows and insights. Maybe you’re one of them?

I will update this initial list with the high value contributions from the community. This is will be a living document for the time being!

Shout-out to @aprendendo.next, @EricGT, @merefield for initial valuable contributions!

7 Likes

It is also important to emphasize that 100% test coverage of the core code is essential. Yes, it is a significant amount of work, and yes, it can feel like you are writing tests for situations that “should never happen.”

However, the current mindset of many programmers still assumes they are programming only with other humans. When AI is involved, that assumption breaks down. An AI often has visibility into the entire codebase and will actively use any available paths to satisfy the prompt—even if those paths lead to behavior the user never intended.

Achieving 100% test coverage helps keep the AI in check by clearly defining what is allowed, what is forbidden, and what must never occur. In that sense, exhaustive tests are no longer just about catching bugs—they are a critical part of constraining and guiding AI behavior.

8 Likes

Another effective way to leverage tests is to define clear criteria for when a task is considered done. By starting with a test that specifies the expected behavior, the model can iterate until the implementation satisfies it.

In my view, this is one of the strongest features available right now. It lets us be fairly confident that the result will work, which is particularly valuable when building prototypes.

3 Likes

im testing an AGENTS.MD edit that explicitly says that agents should not read docs unless explicitly needed, this helps the ai not rot the context in early prompting stages, your .md files for agents should be strong tho, im still testing the outputs, let me know if you find success! (i know about gsd and how fights context rot, im trying to not rush some features in my projects with parallelization)

3 Likes

How are you evaluating the results? Is it mostly based on intuition, or are you running any quantifiable checks?

Otherwise, this approach sounds very reasonable: faster, cheaper or using fewer tokens from the quota, and with a lower risk of losing relevant information during compaction.

3 Likes

As an avid user of VSCode Codex extension (Codex – OpenAI’s coding agent) and Plan mode with AI, now have both.

Note: Plan mode with Codex, as opposed to the plan skill for Codex, is so new I do not find any official documentation at Codex

Thanks to CSWYF3634076 · GitHub who noted this is now active.

For vscode as the change is very recent, had to reinstall the extension to get to version 0.4.69

Then click “+”

Image

to reveal “Plan mode” selector

When selected updates the status bar.

1 Like

im mostly testing changes on a test suite i have for my software, i have successfully closed the loop and also with greptile reviews in my git repo when i do pr’s (thanks to th3o from t3)

1 Like

While skills are designed to activate automatically, they can also be invoked on demand.

Why would you want that workflow?

For example:

A common need across many AI tools is to create a plan that is materialized as a Markdown document and saved locally in some temp directory with the exact location not reported. That document often needs to be stored/exported to a specific location with a specific name—for example, in a project’s plans/ directory, or referenced or attached to other related documents.

While reviewing Plan Mode issues today, this use case was noted by OpenAI staff.

1 Like

Still need to update the main list but the team is really pushing to get everyone on board with Codex.

Learning opportunity at 2026-02-05T19:00:00Z

2 Likes

Sorry. No Mac walled garden here to run their app.

For next time: cross-platform, open source, interpreted, inspectable.

I’ve confirmed with the team that Linux and Windows versions are on the roadmap.

No need to further derail the topic.

BREAKING NEWS: gpt-5.3-codex is out! Upgrade to 0.98.0 to try it.”

and the updated message

BREAKING NEWS: gpt-5.3-codex is out! Upgrade to 0.98.0 for a faster, smarter, more steerable agent.”


How I was able to access model: gpt-5.3-codex

Edited config.toml to set model

model = “gpt-5.3-codex”

Since I use VS Code then Ctrl + Shift + P
> Developer: Reload Windows

Now shows in model selctor

Note: > Developer: Reload Windows needed for each VS Code window open.

2 Likes

@EricGT, circling back to the best practices discussion. You mentioned aiming for 100 percent code coverage. Do you mainly use this as a way to help Codex produce better and faster results, or do you see it as more general guidance for pairing with an AI while coding?

From my perspective, defining tests upfront to clearly signal when a task is complete is an excellent way to get higher quality outputs and ensure the solution actually works.

In one setup, the test results are produced externally, so I have the Codex CLI wait for a predefined amount of time before checking the logs. It works very well because it allows the loop to continue until a required result is achieved.

1 Like

General guidance for pairing with an AI while coding is closer.

In some situations, I think of a strong AI coding model as a software developer that complements my experience. There are things I know that it does not, and things it knows that I do not. The more we collaborate—especially in pushing toward 100% test coverage of core code—the more both of us learn.

1 Like

I think it’s a good approach to leverage tests for better results by building the validation loop at the very beginning of a task. Codex can then iterate over the results until the desired outcome is achieved. Designing the tests to be meaningful then becomes a necessary part of the planning phase.

One thing I am actively experimenting with and testing is something called “Tiger Style” programming which was created by the devs of the Tiger Beetle database. It’s a very strict and safe way to ensure that everything works and it is a technique I have never seen anyone else do in my over 11+ years as a software engineer.

I created a repo where I am actively trying to modify it for coding with AI, specifically Codex. You can find it here on my github. I can’t share links yet, so here is my attempt to show you where to find it.

github / robertguss/programming_tiger_style

Inside of the resources/ directory are the official tiger style development style docs and also the power of 10 ideas from NASA that they use for creating mission critical software that cannot have any bugs in it.

I hope this helps someone else and please let me know if you try and and give me feedback on it. This style is in addition to TDD which I have enforced and made mandatory as I get the best results from AI when it follows a strict red, green, refactor process.

Cheers!

2 Likes

Here’s a tip:

  • OpenAI: Expose or reconfigure the AI model top_p and temperature lower, so the AI model sampler doesn’t select from mistakes in reciting back code verbatim to the apply_patch tool (or whatever it takes for reliability).

This is with the code base in context always being the current state of the full files that have write allowed, and the AI knowing they are the live versions intra-patch:

Codex 5.2 API says

  • Patch one change at a time (imports, then a single function signature, then a single return statement, etc.).
  • When a patch failed, I located the actual current text (as shown in the file content) and rewrote the hunk to match that exact text.

Concrete examples from our history:

  • The patch that tried to change:

    • xxx and ...
      failed because the file already had a different expression.
      Fix: patch the exact existing line (or patch the return tuple instead, which did exist).
  • The patch that expected a return True in _xxx failed because the function didn’t have that exact line in that exact place.
    Fix: patch the exact block around the lazy-bq return that did exist, and adjust the function signature and returns accordingly.

Did the apply_patch tool not act like its description?

From what we observed: it behaved exactly as described.

  • It is strict about matching the provided context.
  • It does not do approximate matching, AST-based matching, or whitespace-insensitive matching.
  • When it says “Invalid Context”, it really means “I can’t find this exact sequence of lines”.

So: no, it didn’t behave unexpectedly; the failures were due to my patch hunks not matching the file’s exact current text.

The errors are beyond what could be improved on with “fuzzy” matching.


I sort of automated a linter, but changing the file’s report on the AI for every tool call or injecting its text doesn’t seem it would be as productive as letting the AI patch progress status into this report in the place reserved for writing updates per-issue, and having a context where the tool call history makes sense.

Moar

Tip

Implement a file normalization workflow for the non-binary files the apply_patch tool can understand: both in terms of what you want the AI to see, what you want the patch tool to accept or reject, or that you’d track the original state of the input file’s end of line sequence, LF or CR+LF typically seen with Windows files, and have them patched properly regardless.

So… you then don’t have to write detectors and guards and repairs of past patches the AI model made, such as this in action:

[warning] Mixed line endings detected:
  - /functions_apply_patch_schema.txt (LF=34, CRLF=1, CR=0)
You will be prompted to convert to: UNIX (LF), Windows (CRLF), or use as-is when tracking.

Actions:
  0            - Track a new file name (permitted destination; not created yet)
  <number(s)>  - Track file(s) (e.g., '1' or '1,3,5')
  all          - Track all existing files
  clear        - Untrack all files
  [Enter]      - Return to chat

> 4

[warning] /functions_apply_patch_schema.txt has mixed line endings (LF=34, CRLF=1, CR=0)
Choose how to proceed:
  1 - Convert to UNIX (LF)
  2 - Convert to Windows (CRLF)
  3 - Use as-is
  4 - Skip tracking this file
Select [1/2/3/4]: 2
[workspace] Tracking: /functions_apply_patch_schema.txt

Here’s an interesting write-up on how an internal team at OpenAI built and shipped an internal beta of a software product with zero lines of manually written code:

https://openai.com/index/harness-engineering/

It contains several practical insights and lessons worth reviewing. Sharing one paragraph as a preview:

What’s become clear: building software still demands discipline, but the discipline shows up more in the scaffolding rather than the code. The tooling, abstractions, and feedback loops that keep the codebase coherent are increasingly important.

P.S. I will rename this topic to “tips and tricks” so others can share useful ideas, even if they do not qualify as best practices.

3 Likes

I love the progressive disclosure approach! These days I am exclusively in Codex CLI with agent skills, and this is exactly how I work - maintain a ToC file (gets injected either in `AGENTS.md` or `SKILL.md`), together with `progress.md` that gets updated, and a set of other specific markdowns. This “garbage collection” process they have is also very interesting.

4 Likes

The practical change is that agents can stay coherent for longer, complete larger chunks of work end-to-end, and recover from errors without losing the thread.

Long horizon tasks with Codex by Derrick Choi

Long horizon tasks is an OpenAI Cookbook Codex example

Takeaways for long-horizon Codex tasks

What made this run work was not a single clever prompt. It was the combination of:

  • A clear target and constraints (spec file)
  • Checkpointed milestones with acceptance criteria (plans.md)
  • A runbook for how the agent should operate (implement.md)
  • Continuous verification (tests/lint/typecheck/build)
  • A live status/audit log (documentation.md) so the run stayed inspectable

This is the direction long-horizon coding work is moving toward: less babysitting, more delegation with guardrails.

2 Likes