Switching model with Responses API breaks multi-shot context (previous model's responses not available for other models)

y.dyuster · May 31, 2025, 10:38pm

I am exploring new Responses API for my project and I have a strange problem with multi-shot approach using “previous_response_id” in my requests which makes this endpoint absolutely incomparable to Assistant API…

First two shots are done with o4-mini reasoning model, and when I set for the third shot gpt-4o-mini model, it cannot see the previous responses of the model (messages with role=“assistant”, they disappeared from the history, and the model itself recognizes that nothing were responded before (but it was!). However, all the user’s messages are stored. This problem is not reproducible if I use the same reasoning model for the third shot.

I have asked the “help bot”. If it is a bug, or intended limitation. And it responded:

This is an intended limitation of the Responses API. When you use “previous_response_id”, the expectation is to keep the same model within a thread to preserve context and history. Changing the model can result in loss of the prior context because each model may handle tokens, memory, and internal representations differently.

This behavior is not unique to gpt-4o-mini, but rather applies to all responses where a model switch is made mid-thread. What can you do?

To ensure that your model continues to see the full conversation history, keep the model the same throughout your multi-shot process when using “previous_response_id”.

If you must switch models, manually re-include the previous message history in your “messages” list for the new request, as you would for a stateless Chat Completions call.

Mixing models is a standard approach for multi-shot and multi-agent systems… This limitation is an absurd… Different models are “cooked” for different tasks, and forcing users always preserve the choice of the model during multi-shot conversation decrease the quality and efficiency drastically! It was never a problem in Assistant API.

@OpenAI_Support is it really intended behavior or just a hallucination of the help-bot? If it is, it goes absolutely against the positioning of Responses API as “stateful” system for building multi-agent systems.

aprendendo.next · May 31, 2025, 10:50pm

It is not an hallucination, but a model limitation.

If you look at the API response you should get a detailed error:

BadRequestError: Error code: 400 - {'error': {'message': 'Reasoning input items can only be provided to a reasoning or computer use model. Remove reasoning items from your input and try again.', 'type': 'invalid_request_error', 'param': 'input', 'code': None}}

But as the response you got, you can reconstruct the conversation and keep going, if you think it is worth it.

y.dyuster · May 31, 2025, 10:57pm

Thank you for the response! It is really a weird limitation… And by the way, in my case, it does not throw any error, the endpoint responds successfully, but the answers are just meaningless, and in the logs I see this:

If I review the logs of the original previous responses, I see that there are answers from the models included there.

aprendendo.next · May 31, 2025, 11:04pm

That’s new to me, last time I checked it refused right away with an error.

It seems that you reconstructed the input conversation with empty reasoning roles + informed previous_response_id, is that it?

y.dyuster · May 31, 2025, 11:06pm

No, I did not reconstruct any part of the previous conversation as part of the list of input items. In all three shots, my input was always only a new string. I just used previous response ID for linking the thread.

y.dyuster · May 31, 2025, 11:10pm

This was the first shot:

This was the second shot, linked with the first one using previous response ID:

y.dyuster · May 31, 2025, 11:18pm

@aprendendo.next fun fact, I continued the conversation with “help bot” and asked for some references to the documentation, and it looks like there’s still a chance it’s a bug, not intended behavior. Here is the last reply:

Thank you for challenging the information and asking for references from the official documentation. Based on a search of the available help center and documentation files, there is currently no statement or documented limitation in the Responses API documentation that says switching models during a multi-shot flow with “previous_response_id” breaks the conversation history or context. The documentation consistently describes that when you provide a previous response ID, the full history should be considered as input tokens, and you are simply charged for the total number of tokens included in the prompt. This is in line with your understanding. Therefore, the behavior you are observing—where history is dropped when switching models—does appear to go against the way the API’s conversation threading is described and could indicate a bug. You are correct that such a model-agnostic conversation threading—where all history is available regardless of the model used for each shot—is standard for OpenAI’s APIs including the Assistants API. Your request and technical details are very valuable…

aprendendo.next · May 31, 2025, 11:31pm

Interesting, I don’t know how you got it to respond with gpt-4.1-mini from a o4-mini response as previous_response_id.

It gives me Error code: 500 - {'error': {'message': 'An error occurred while processing your request.

Although not documented, in the playground if you change from a reasoning model to a non reasoning one, it gives this warning:

Anyway, it is something that would be welcomed as a new feature, but unfortunately it seems to be broken for now.

y.dyuster · May 31, 2025, 11:37pm

I do not use “reasoning summaries” because it requires a verification which is not available for me (with no reason, it’s another “issue”)…

My intention was just propagate the final model responses to the further conversation, not summaries.

aprendendo.next · June 1, 2025, 12:02am

Yeah it makes sense, it would save costs to use a non reasoning model to polish some minor details.

I was just looking at a way to retrieve the inputs on an easier way but unfortunately it still requires some effort.

Using the list inputs endpoint we can get them, but reconstructing the inputs for a new request can be quite a piece of work depending on the variety of input modalities… best way is still to save an array as you interact if there is the intention to switch models.

y.dyuster · June 1, 2025, 12:13am

It was exactly the approach used in Assistant API, you collect all the necessary messages in thread, and explicitly manage run of different Assistants (with different models) based on your multi shot logic and strategy.

Open AI announced Responses API as a more efficient stateful replacement, as for Assistant API as well as for stateless Chat Completion API, promising that all the capability will be covered… But it looks like not all…

Of course, I can manage the entire state of the conversation on my side, and send it every time as input. But it makes totally meaningless using Responses API vs. Chat Completion. Because Chat Completion is an industry standard, all the other competitors support its interfaces for easier migration (in case of any emergency), which doesn’t work with Responses API. Totally meaningless, create one another statless interface for the same tools…

_j · June 1, 2025, 2:07am

It’s important that one understands:

The Responses playground does not actually reuse stored responses.

It will re-send what is seen in the input boxes for messages.

However, it will also try to employ hidden “states” to a response, such as a reasoning ID, a code interpreter session ID, and others, thus making “store”: false almost futile there anyway.

Thus, one has to move over to the API and try the full pattern of use of passing previous response ID.

If doing chat self-management, you can receive and re-send encrypted reasoning (or completely omit them), along with other things like reusing a code interpreter session, but with other things that just cannot be done well without a server-side state, such as image gen edits. The API is documented (but not proven) to discard these reasonings anyway if not around tool use. They may only provide some persistent intention behind why the AI was thinking tools were useful.

Caution: if using the playground, you are creating stored Responses entries with no way in the user interface to delete them, and they continue to persist longer than 30 days. Not a good place to “make my picture into a cartoon”, and a poor pattern in general by OpenAI.

aprendendo.next · June 1, 2025, 1:18pm

Indeed, you are correct. The playground doesn’t use previous_response_id, it recreates the inputs.

I mentioned it because as we lack a documented guideline for why a stateful conversation that uses reasoning models (with or without reasoning summaries enabled) can’t be used by non reasoning models to continue it, playground was the closest to an “official statement”.

But I think we have to acknowledge that the responses API is slowly improving, it is much better now with code interpreter and other things that makes it a bit closer to assistants and “ChatGPT”.

Not “exactly” like them, but it is improving. We will get there eventually, hopefully.

aprendendo.next · June 1, 2025, 1:55pm

I added this in our wishlist thread in the hopes it gets some attention eventually.

Topic		Replies	Views
My experience switching from Assistants API to Responses API Feedback assistants-api	50	14093	September 3, 2025
Introducing the Responses API Announcements	35	44027	August 17, 2025
Using previous_response_id fails when swapping from reasoning -> non-reasoning models Bugs responses	3	834	May 28, 2025
How to keep session with gpt-3.5-turbo api? API	36	50960	September 26, 2023
How do you maintain historical context in repeat API calls? API	28	95230	December 23, 2023

Switching model with Responses API breaks multi-shot context (previous model's responses not available for other models)

Related topics