Introducing the Responses API

We’re making improvements to how you build assistants and use tools with the OpenAI API. Based on your feedback from the Assistants API beta, we’ve built the Responses API — a faster, more flexible, and easier way to create agentic experiences that combines the simplicity of Chat Completions with the tool use and state management of the Assistants API. To start, the Responses API will support new built-in tools like web search, file search, and computer use. You can read more about Responses and these new tools in our blog post.

Here’s what this means for the Assistants API:

  • Feature parity between Responses and Assistants: We’re working on bringing key Assistants API features — like support for Assistant-like and Thread-like objects, plus the Code Interpreter tool — into the Responses API.
  • Deprecation timeline: Once parity is reached, we will announce the deprecation of the Assistants API in the first half of 2026 with 12 months of support from the deprecation date so you have ample time to migrate.
  • Migration support: When we announce the deprecation date, we’ll also provide a comprehensive migration guide to help you move smoothly to Responses, with full data preservation.
  • No immediate changes: The Assistants API will continue to be supported in the near term, and we’ll continue to add new models to it. We’re planning deprecation in 2026, but we’ll follow up again to give you notice of the full plan.

You can get started in our docs. I’m excited to see what you build with these new APIs and tools, and please don’t hesitate to reach out here if you run into any questions.

12 Likes

I’m in the early stages of building a chat-focused app using the Assistants API. Would it make sense to jump to Responses now or continue on this path and wait for the migration?

1 Like

I’m in the same situation and would love an honest opinion on this. Should I stick with the Assistants API for now or switch to Responses ahead of the migration?

Does it work with the Batch API?

@ben.mcgee.good switching at this time to Responses API would make sense, alternately using chatCompletions API to familiarize yourself with building and then migrating to Responses API.

so that’s why assistant api was in beta all the time.

3 Likes

Assistants API is slow too

1 Like

OMG - so SLOOOOOW. Hope the reponsesAPI is faster

1 Like

If I’m not wrong. Responses API does not yet have the features that you would want in the assistance API like the threads so for people building chat like interfaces you still can rely on assistants API.

Seems responses is the way to go. But migrating should not be that hard either.

The responses API doesn’t have threads, a single list of messages and internal calls that you can add to and run.

Instead, for server-side chat history state, it stores the request ID and its completion contents with every API call you make.

There is also an API parameter for “instructions” if you want the first system message to be separately changeable instead of being a part of the past messages sent.

To use this, (in the manner that many novices would expect API calls to somehow “remember” exactly who they are), you take the most recent response’s ID and send it back as the parameter for previous_response_id. When you do that, the past chat and responses you specify is reused, and you only need the newest user role question as a message in input.

There is still no cost management: the chat length will grow to the model maximum, where it will return an error, or you can set truncation:auto to start deleting old messages at the model’s maximum input.

1 Like

I went ahead with the migration from assistants to responses, not really that big a deal and it does seem faster. One bit of the documentation is a little confusing though. Say I want my assistant to keep the instructions I give it in mind throughout a chat and I am using the previous_response_id to keep the context. Do I add the instructions into every request or depend on it to access them through chaining back to the initial request? Or do I shift to sending them with a ‘developer’ role ?

3 Likes

Hey ben, just took a look at the documentation for the responses API and it seems like the previous system messages get overwritten every time you utilize the previous_response field so it seems like you need to add the instructions into every request.

https://platform.openai.com/docs/api-reference/responses/object

instructions
string or null

Inserts a system (or developer) message as the first item in the model’s context.

When using along with previous_response_id, the instructions from a previous response will be not be carried over to the next response. This makes it simple to swap out system (or developer) messages in new responses.

1 Like

It seems that the recommendation is to use the instructions parameter for the first chat in the conversation and then switching to adding them as a developer message for each request as the chain continues. https://platform.openai.com/docs/guides/text?api-mode=responses#message-roles-and-instruction-following

Honestly this feels a bit awkward… but I guess it gives more fine grained control through the conversation than just setting the instructions once at the beginning as in the Assistants API. e.g. you might want to change the developer message (aka persistent instructions) depending on where the chat went.

That doesn’t quite have the effect you’d want. Not recommended. The page linked is poor.

Instructions is its own message - dynamically placed or not placed when you use that parameter, inserting its own initial message per API call.

If you were to send an “input” of a developer and a user role message, that part would become the permanent replayed conversation by reusing a response id.

If you were to add instructions also to the initial API call:

system: Image input capabilities: Enabled
system: (instructions by parameter)
system: (system from input)
user: What is up, homeboy?

(The image enabled, a cutoff date, or secret instructions, that’s what OpenAI is up to before you have control)

If you were to “switch to adding them as a developer message” mid-conversation, you’d have removed your dynamic instruction parameter, and be sending an additional system instruction into the conversation before the user input in every new messages addition to persistent state, and they would pile up in the chat history (and maybe be finally followed)

Thus: just pick

  1. if your chat will have a permanent setting and send system once in input, OR
  2. use the instructions parameter and only send user messages.
1 Like

My question is , will we be able to use other models, like openrouter llms and just change the base url… The completions sdk is flexible with this… Its probably not up to openai though but other llm services if they want to use a different sdk. Either way, its the main reson for sticking to chat.completion, for me anyway

I haven’t tried the responses API yet, but my feelings are that the thread was a better model than this linked chain of previous answers. However, the answer that may sway me towards the responses API bandwagon is: Are responses with previous IDs faster than the old assistant with threads? Will responses become slower over time as the linked list of previous responses grows? And finally: Are responses returned from previous function calls considered, or only the textual response? This is important because it may help avoid calling the same API again.

What is the correct “purpose” to use with files slated for a responses API call? for Assistants, we put “assistants” in so they know to grab that data in the vector store. Do files for responses API calls need to be tagged any special way? from the API docs:

The intended purpose of the uploaded file. One of: - assistants : Used in the Assistants API - batch : Used in the Batch API - fine-tune : Used for fine-tuning - vision : Images used for vision fine-tuning - user_data : Flexible file type for any purpose - evals : Used for eval data sets

Will the responses aPI be able to handle more than one persona in the context thread? Today we have “Assistants” and “User” - will the new API tolerate multiple roles in a single context?

The output examples for the responses api still shows that it is using “assistant” See below:

“output”: [
{
“type”: “message”,
“id”: “msg_67ccd2bf17f0819081ff3bb2cf6508e60bb6a6b452d3795b”,
“status”: “completed”,
“role”: “assistant”,
“content”: [
{
“type”: “output_text”,
“text”: “In a peaceful grove beneath a silver moon, a unicorn named Lumina …”,
“annotations”:
}
]
}
]