Can Codex Low match Medium on easy tasks?

Here is another use case I just ran, this time with a more advanced model and thinking enabled. In hindsight, given the nature of the task, a lower-tier model with thinking disabled would likely have worked just as well.

While building with CMake, a few hundred instances of the same error appeared. All of them ultimately traced back to a change in a single C macro.

I used the more advanced model with thinking enabled to perform the analysis that led from the compiler errors back to the specific macro definition. That kind of reasoning step—from noisy build output to the root macro line—is not something I would expect a lower model with thinking disabled to handle reliably.

However, once the initial macro was identified, the remaining work was to document the macro expansion chain. At that point I did not change the model or thinking settings, even though the task had become straightforward: simple read-only searching and documentation of the expansion steps. A lower model without thinking would have been sufficient for that phase.

In practice, I keep a close eye on token usage over a given time window. When I have plenty of tokens available, I tend not to switch models. When tokens are tight, I switch more aggressively, or even fall back to issuing manual commands and doing parts of the work myself to conserve tokens.


Also see


Update - have to post in this topic as limited to only two replies in a row.

Over the past two days, I have been working on skills and their supporting scripts while also submitting GitHub pull requests. A few hours into the task, I remembered this topic and switched to a lower-capacity model.

The lower model did not produce the strongest answer on the first attempt, nor was it able to complete as much of the work independently. However, this introduced an interesting side effect: I had to actively guide the model toward a better solution, drawing on prior experience and carefully reviewing assumptions and lines of reasoning that were incorrect. In practice, this meant thinking through the problem again rather than simply reviewing a proposed action and approving it. The process was unexpectedly engaging.

As a concrete example, I have a Visual Studio Code project using multi-root workspaces with two separate Git repositories open; not in the same workspace. The model incorrectly inferred that one of the workspace folders was itself a Git repository, when in fact only the skills directory was version-controlled. After verifying the repository structure and explicitly providing the full path to the .git directory, the model recognized the error and produced an improved response.

While the revised answer was still not optimal, it represented a clear step toward the final solution.

1 Like