I am exploring new Responses API for my project and I have a strange problem with multi-shot approach using “previous_response_id” in my requests which makes this endpoint absolutely incomparable to Assistant API…
First two shots are done with o4-mini reasoning model, and when I set for the third shot gpt-4o-mini model, it cannot see the previous responses of the model (messages with role=“assistant”, they disappeared from the history, and the model itself recognizes that nothing were responded before (but it was!). However, all the user’s messages are stored. This problem is not reproducible if I use the same reasoning model for the third shot.
I have asked the “help bot”. If it is a bug, or intended limitation. And it responded:
This is an intended limitation of the Responses API. When you use “previous_response_id”, the expectation is to keep the same model within a thread to preserve context and history. Changing the model can result in loss of the prior context because each model may handle tokens, memory, and internal representations differently.
This behavior is not unique to gpt-4o-mini, but rather applies to all responses where a model switch is made mid-thread. What can you do?
To ensure that your model continues to see the full conversation history, keep the model the same throughout your multi-shot process when using “previous_response_id”.
If you must switch models, manually re-include the previous message history in your “messages” list for the new request, as you would for a stateless Chat Completions call.
Mixing models is a standard approach for multi-shot and multi-agent systems… This limitation is an absurd… Different models are “cooked” for different tasks, and forcing users always preserve the choice of the model during multi-shot conversation decrease the quality and efficiency drastically! It was never a problem in Assistant API.
@OpenAI_Support is it really intended behavior or just a hallucination of the help-bot? If it is, it goes absolutely against the positioning of Responses API as “stateful” system for building multi-agent systems.
If you look at the API response you should get a detailed error:
BadRequestError: Error code: 400 - {'error': {'message': 'Reasoning input items can only be provided to a reasoning or computer use model. Remove reasoning items from your input and try again.', 'type': 'invalid_request_error', 'param': 'input', 'code': None}}
But as the response you got, you can reconstruct the conversation and keep going, if you think it is worth it.
Thank you for the response! It is really a weird limitation… And by the way, in my case, it does not throw any error, the endpoint responds successfully, but the answers are just meaningless, and in the logs I see this:
No, I did not reconstruct any part of the previous conversation as part of the list of input items. In all three shots, my input was always only a new string. I just used previous response ID for linking the thread.
@aprendendo.next fun fact, I continued the conversation with “help bot” and asked for some references to the documentation, and it looks like there’s still a chance it’s a bug, not intended behavior. Here is the last reply:
Thank you for challenging the information and asking for references from the official documentation. Based on a search of the available help center and documentation files, there is currently no statement or documented limitation in the Responses API documentation that says switching models during a multi-shot flow with “previous_response_id” breaks the conversation history or context. The documentation consistently describes that when you provide a previous response ID, the full history should be considered as input tokens, and you are simply charged for the total number of tokens included in the prompt. This is in line with your understanding. Therefore, the behavior you are observing—where history is dropped when switching models—does appear to go against the way the API’s conversation threading is described and could indicate a bug. You are correct that such a model-agnostic conversation threading—where all history is available regardless of the model used for each shot—is standard for OpenAI’s APIs including the Assistants API. Your request and technical details are very valuable…
Yeah it makes sense, it would save costs to use a non reasoning model to polish some minor details.
I was just looking at a way to retrieve the inputs on an easier way but unfortunately it still requires some effort.
Using the list inputs endpoint we can get them, but reconstructing the inputs for a new request can be quite a piece of work depending on the variety of input modalities… best way is still to save an array as you interact if there is the intention to switch models.
It was exactly the approach used in Assistant API, you collect all the necessary messages in thread, and explicitly manage run of different Assistants (with different models) based on your multi shot logic and strategy.
Open AI announced Responses API as a more efficient stateful replacement, as for Assistant API as well as for stateless Chat Completion API, promising that all the capability will be covered… But it looks like not all…
Of course, I can manage the entire state of the conversation on my side, and send it every time as input. But it makes totally meaningless using Responses API vs. Chat Completion. Because Chat Completion is an industry standard, all the other competitors support its interfaces for easier migration (in case of any emergency), which doesn’t work with Responses API. Totally meaningless, create one another statless interface for the same tools…
The Responses playground does not actually reuse stored responses.
It will re-send what is seen in the input boxes for messages.
However, it will also try to employ hidden “states” to a response, such as a reasoning ID, a code interpreter session ID, and others, thus making “store”: false almost futile there anyway.
Thus, one has to move over to the API and try the full pattern of use of passing previous response ID.
If doing chat self-management, you can receive and re-send encrypted reasoning (or completely omit them), along with other things like reusing a code interpreter session, but with other things that just cannot be done well without a server-side state, such as image gen edits. The API is documented (but not proven) to discard these reasonings anyway if not around tool use. They may only provide some persistent intention behind why the AI was thinking tools were useful.
Caution: if using the playground, you are creating stored Responses entries with no way in the user interface to delete them, and they continue to persist longer than 30 days. Not a good place to “make my picture into a cartoon”, and a poor pattern in general by OpenAI.
Indeed, you are correct. The playground doesn’t use previous_response_id, it recreates the inputs.
I mentioned it because as we lack a documented guideline for why a stateful conversation that uses reasoning models (with or without reasoning summaries enabled) can’t be used by non reasoning models to continue it, playground was the closest to an “official statement”.
But I think we have to acknowledge that the responses API is slowly improving, it is much better now with code interpreter and other things that makes it a bit closer to assistants and “ChatGPT”.
Not “exactly” like them, but it is improving. We will get there eventually, hopefully.