Transition from Assistants API to Responses API

jr_aulakh · March 19, 2025, 9:59am

Hello all,

I am going to start transitioning from my Assistants api to responses api sooner rather than later.

There is currently no docs on how to do this, so i’m trying to figure it out on the fly.

My main issue is understanding thread_id alternative in responses api.

I understand that you can use previous_responses_id to call the conversation, but what is actually being fetched.

My concern is tokens. In my use case in assistants api i use file search. this retrieves chunks, adds to the input and then gives a response.

With responses api i have read that previous_responses_id adds ALL previous inputs into the current input. Now some of my previous inputs have token lengths have 20k input (90% coming from retrieval). If response api adds all previous inputs thats going to get expensive and fast.

Can someone please shed some light on transitioning from assistants api to responses baring in mind the above concern?

Would really appreciate some help! Thanks in advance!

Tas

jr_aulakh · March 19, 2025, 12:53pm

Ok so this is insane.

Upon further testing.

I have the exact same set up in Assistants api and responses api, except obviously I am having to use previous_response_id in every turn for responses api.

Both use filesearch with the exact same settings.

same conversation: 5 messages in and out

Tokens used

assistant api: 58,363 total (in/out)
Responses api: 145,809 (in/out).

I can see an output of cached tokens which keeps increasing on each turn.

Ok CLEARLY this is not feasible for two reasons, context window and costs.

If this is the new way, or the way it should be then openai just buried themselves, but i’m sure they cant be that stupid.

@1st_contact @mstefanec

domingosber · March 19, 2025, 2:05pm

I was thinking about making the same transition in my application by replacing the assistant with the response API.

However, it seems interesting to wait a little longer, I believe that in the future the Response API will have more features that could justify the change.

victoria.langoe · March 19, 2025, 2:09pm

They wrote that they would post documentation for migration, but it has not been published yet.

jr_aulakh · March 19, 2025, 2:15pm

So currently the way it works to keep context of the conversation you have to:

Create a new message
Get a response - In this response you will have a previous_response_id.
Second turn - you must add the previous response id

for every NEW turn you must add the response id of the last response.

This way it keeps a chain of responses.

But yeah no way this is feasible, need to see the doc when they publish it.

jim · March 19, 2025, 3:41pm

How is this different from Assistants API? Clearly the token count is different, but why aren’t you limiting it the way you would do previously with max_tokens?

I started the transition this week too, and found the results to be insanely faster and much richer in terms of context gathered (file_search).

In my app, I’m keeping the Thread model, but then instead of a bunch of messages, now there are a bunch of responses (with a bunch of items).

PaulBellow · March 19, 2025, 8:44pm

Great thread idea! Thanks for starting.

Following to hopefully learn something!

_j · March 19, 2025, 8:56pm

The main concern here is the large amount placed into the model input context from a growing “history” of a sequence of API calls.

It is indeed a concern. There is no management of how much you want to spend.

With a large server-side conversation state with no management, you pay for input, and were it to invoke a file search, then you pay for input again when internally the AI is given another generation to respond to the results of a file search - which can be yet another attempt at a search. Piling on to the input costs, the billing for one call can be larger than the entire model context.

The threshold parameter only serves to give you continued operation at maximum, instead of a default of an API error when you have exceeded the model context (which can happen even internally).

Thus: server-side state not ready for a “chat” production environment. At best, you can potentially pay $0.50 per gpt-4o API call if the AI doesn’t persist with tool calls.

angel.sancho.ferrer · March 19, 2025, 9:37pm

Same here. I still don’t see the benefits of the new API. It looks a too early announcement lacking of critical documentation.

mat.eo · March 19, 2025, 9:39pm

There’s little reason to use it unless you’re dependent on OpenAI for state management, or want that temporary niceness of using their agentic systems

chtshop · May 5, 2025, 8:19pm

I am appalled, I have a multi-turn conversation app which involves generating a large JSON schema in addition to the “human conversation”. With responses API, it processes all the tokens from all the previous responses, with NO ABILITY TO LIMIT TOKENS OR NUMBER OF PREVIOUS RESPONSES, so pretty soon, it’s slow as molasses and consuming 1M+ tokens per turn. This is insane and I’m going back to chat completions API.

Topic		Replies	Views
Introducing the Responses API Announcements	34	27388	May 19, 2025
Responses API: Question about managing conversation state with previous_response_id API responses-endpoint	12	1352	May 4, 2025
Is there a future for the Assistants API? API assistants-api	12	2085	March 13, 2025
My experience switching from Assistants API to Responses API Feedback assistants-api	41	4188	May 26, 2025
Migrating Assistant to Responses - long thread management API	2	190	May 13, 2025

Transition from Assistants API to Responses API

Related topics