Assistants API → Responses API: this is not a 1:1 migration

One thing that’s tripping people up is assuming the Responses API is just “Assistants without threads.”
It’s a bigger shift than that.

The Assistants API managed:

  • Threads

  • Runs

  • Tool attachment lifecycle

  • Polling / orchestration

The Responses API flips the model:

  • Stateless calls

  • No server-side thread lifecycle

  • No tool_resources injection at request time

  • App owns memory, retries, orchestration, and state

In practice, this means:

  • Your application becomes the conversation manager

  • Tools like file_search resolve context implicitly

  • Realtime / streaming becomes a deployment concern, not a model concern

If you’re migrating, think less “API replacement” and more “architecture change.”

Curious how others are handling multi-turn state and file context post-migration.

Doesn’t the responses API have the concept of conversations stored by OpenAI, not your app?

1 Like

Yes. While the Responses API can have an essay diatribe written about its failings, the AI-powered statements in the first post, from an account only posting AI-generated messages, are not true.

  1. Two stateful ways of having OpenAI-hosted chat history state:
  • Store API request IDs (“store”:true), and then pass previous_response_id to continue a chain of conversation.
  • Use the Conversations API, creating an ID for your chat, and passing it in any Responses call as chat history, and the Conversation object is updated with the latest input and the AI response. You do not need to then retrieve the answer, unlike Assistants; it is serviced directly unless using “background”.
  1. Thread parallel is addressed above; the only lack of “lifecycle” and lack of control is that any conversation state and input will be used to run up to the maximum input of any model without budget limit.

  2. Tools: mostly true

  • you have to employ and pass vector store IDs that you manage yourself for a conversation, a user, or for a particular application. Not following someone else’s pattern is the developer ideal, but unfortunately the tool spec injection includes only one use for the AI, informing “the user uploaded files”. New fees per use.
  • You have a container ID for a code interpreter session, either automatic or by your creation on the containers API endpoint. They are ridiculously short-lived, a new fee for every re-creation, container contents are guarded and gated unless the AI correctly produces a citation in-response to allow you to retrieve a file, another symphony of suck no better than ChatGPT.
  1. Your app owns x?
  • memory: server conversation state methods above
  • retries, orchestration? The Responses API server has an internal iterator for its hosted tools, and for cognitive failures, it is the AI that can retry tools. API connection issue retries are already part of the OpenAI SDK.
  • state? Another way of saying memory.

One of course must re-tool if you’ve been using easy-but-actually-harder Assistants and want to go to easy-but-ridiculous, where streaming is a concern because of connections being serviced needing generator gathering code of several dozen types of event to collect from (not whatever “deployment concern” the top post was saying)

You’ve got plenty of application state to manage about customers, their conversation sessions and metadata, their billing and subscriptions and over-use, their moderation strikes, the titles, shape, and expiry of their chats and resources with database IDs and corresponding objects that would be needed on the Responses API just like Assistants - where all the API “chat hosting” is redundant to what must be synchronized with all those API objects anyway.

“Doing it yourself”, and not locking yourself in to all customer resources behind an API bill and you maintaining a working scoped project, is the right answer.

Yep — totally fair point.

There are server-side ways to persist continuity in Responses (via previous_response_id or Conversations), and that’s useful context.

What I was trying to highlight is less about whether state can exist, and more about who owns the lifecycle.

Compared to Assistants, Responses still shifts responsibility for:

  • memory strategy

  • orchestration across turns

  • tool usage boundaries

  • failure / retry semantics

So even with OpenAI-hosted conversation state, the architectural burden moves more clearly into the application layer.

I probably should’ve phrased it as “not assistant-style managed,” rather than “stateless” in the absolute sense.