Codex OpenAI Completions → Responses Migration Pack

You can now use Codex CLI to quickly upgrade from Chat Completions to Responses!

This toolkit:

  • Finds legacy usage in your repo
  • Proposes and applies edits
  • Updates import/request shapes
  • Runs tests/lints
  • Makes a clean branch + PR

Find it here: GitHub - openai/completions-responses-migration-pack: Developer toolkit to migrate applications from the legacy OpenAI Completions/Chat Completions APIs to the unified Responses API, guided by Codex CLI

Or just run /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/openai/completions-responses-migration-pack/main/scripts/completions-to-responses-upgrade.sh)"

6 Likes

Improved reasoning quality and cache utilization when compared to Chat Completions.

It would seem to me that instead, Responses fails in anything related to “cache utilization”.

Reasoning items that are run again are input before a user input turn, but then are dropped at the whim of OpenAI when no longer around tools - says the documentation.

Instant cache break of an input turn when dropping before it. Or never dropping and you have a context 3x as large as the output turns, forever, because of the internal reasoning.

Then we have that using either “conversations” or “previous response ID” as stateful “chat” storage is completely unmanaged in length. You can run it up to the maximum, and then you only have your choice of an error, or cache-breaking every turn when unknown context is dropped.

Nothing on the endpoint is aware of cache persistence and expected expiry to know when to elide a conversation back to a budget, or by how much by model. It just grows until failure.

Instructions: not a dynamic preprompt or post-prompt. You have a cache-killer there also. You have no such mechanism to place late turns non-permanently (except for OpenAI’s own tool system message injections before and after user input to break cache and developer intentions again). Forget RAG placed where it needs to go, in a nonexistent role for it.

“Better reasoning and lower costs” is purely hypothetical. Unless a developer does state management themselves. And avoids tools. And then controls the language instructions of their own functions.

IMO A working tuned application has no reason to leave Chat Completions, unless you also want 3x the network bandwidth streamed back at you for the savings in sending. I would not over-promise. And unlike Responses, you can switch chat AI providers with your self-managed state the second you get a timeout.


Maybe, on Responses, make “conversations” not fail to store anything in “background” if you’ve got some coding time over there, friends. And make the gpt-5 cache mechanism work to discount anything at all, unless your motivation is to break discounting on Chat Completions completely.

CreateIn this context, “aware of cache persistence and expected expiry to know when to elide a conversation back to a budget, or by how much by model. It just grows until failure.” refers to the need for the system—or specifically, the developer using OpenAI’s Responses endpoint—to actively manage the amount of ongoing conversation data (or context) being cached and retained for each chat session. Unlike Chat Completions, where the user can more clearly manage what context is sent, the Responses endpoint does not automatically handle trimming or discarding parts of a conversation when reaching memory or token limits. “Cache persistence” means how long context/history is stored; “expected expiry” is when old context should be removed. “Eliding a conversation back to a budget” means trimming older conversation parts to stay within allowed resource limits (the “budget”). Since the endpoint does not help manage this and doesn’t notify you about cache expiration or removal policies, the context keeps growing until it fails from hitting a maximum limit, forcing developers to manage the state and context-aware caching logic themselves.