Responses API: Question about managing conversation state with previous_response_id

jason.griffing · March 12, 2025, 4:57pm

I’ve been tinkering and want to make sure I’m tracking correctly. It appears that managing conversation state on the server side by passing the
previous_response_id is limited to only the last message?

Am I missing something? I’m looking for a way to extend this functionality such that I could, for example, pass a single parameter that would fetch the entire history of the conversation, not just the last message.

1st_contact · March 12, 2025, 5:27pm

It seems to get the whole thing. I tried it just now:

Hello! This is a test of whether my script’s history is working. In a few turns, I’ll ask you what the magic word is (it’s banana). Don’t mention it until then!

Got it! I’ll wait for you to ask.

In the meantime, how are you today?

I’m just a program, but I’m here and ready to help! How are you doing?

Doing OK I guess.

I’m glad to hear that! If there’s anything you’d like to chat about or need help with, feel free to let me know.

What was the magic word?

The magic word is banana.

jason.griffing · March 12, 2025, 5:54pm

Interesting. I was testing last night and did not get this result. My test case was a little more convoluted though. I’ll have to try this out the way you did. Thank you!

1st_contact · March 12, 2025, 6:05pm

Here’s the code I used just in case.

from openai import OpenAI
api_key = "REDACTED"
client = OpenAI(api_key = api_key)
last_id = None
while True:
	user_input = input("\n> ")
	if user_input.lower() in ["exit", "quit"]:
		break
	if last_id:
		response = client.responses.create(model = "gpt-4o", input = user_input, previous_response_id = last_id)
	else:
		response = client.responses.create(model = "gpt-4o", input = user_input)
	last_id = response.id
	print("\n" + response.output_text)

jason.griffing · March 12, 2025, 7:11pm

Thanks! I was able to replicate this on my end so indeed, it appears to be grabbing the whole history. Appreciate your assistance.

roginn · March 12, 2025, 8:12pm

Are all previous tokens in the conversation charged as input tokens in this case? (ie, same as manually managing state with the completions API)

mstefanec · March 12, 2025, 8:13pm

What if I want not the whole history but only last n messages? How to do this?

1st_contact · March 12, 2025, 11:15pm

Are all previous tokens in the conversation charged as input tokens in this case?

The docs say “Even when using previous_response_id, all previous input tokens for responses in the chain are billed as input tokens in the API.”

fanli · March 13, 2025, 3:49am

I suppose eventually it should support truncation strategies similar to the Assistants API?

jr_aulakh · March 19, 2025, 9:49am

Hey man thanks for responding in this thread.

Im trying to wrap my head around this, but struggling.

What i cant understand is the context of the input tokens being passed into each turn using previous_response_id.

for example, in my use case i am using file search in the assistants api. for every turn this passes chunks received from retrieval + user message.

now if in response api previous_response_id account for EVERY input, does that mean each retrieval input is all inputted. If so, that’s ridiculous.

Any help would be great please guys!

jr_aulakh · March 19, 2025, 11:56am

Ok so i think i get it, but It would be good to know how openai are handling it.

So if you want consistent memory as a thread.

Initial call ( first message)
get response (store the previous_response_id)
create second call and add the previous_response_id.

and then so forth.

It must be a chain, so each turn you must give the last (previous_response_id).

It would be good to know how openai link all those responses together.

If i am completely honest, having a thread_id is a lot more simple. Can anyone see any benefits in the above workflow, rather than having a thread id.

Also it would be good to know if the “instructions” is used as tokens on each turn.

Also from testing i can see how this can get very messy in the api dashboard under logs. Its very difficult to follow a conversation.

rklf · May 3, 2025, 11:28pm

Indeed, on each call we have to provide a previous_response_id to act as a conversation — but this increase tokens usage so much… A first response which use ~3000 tokens goes to ~15000 tokens after a few turns… Are those charged or cached and not charged?
I’m also wondering if instructions use tokens each turn.
It becomes very expensive in my case because I use RAG to provide context as system message which are linked in previous response on each turn which increase token usage so much..
Workaround could be to store conversation and conversation messages and give N last messages as assistant/user messages, but doing this make previous_response_id irrelevant — So I’m not sure it should be the way to go.
Anyone has faced those issues?

_j · May 4, 2025, 3:31am

Yes, you have discovered that Responses is not production grade. Neither its server-side chat state nor its internal tools are ultimately useful.

Repeated inputs still would not be “free”, as your input context just grows and grows.

The cache discount that is offered is that for over 1024 tokens, if routed to the same server, if not expired out in a 5-60 minute service window, you can receive a 50% discount on that input that is in common, the amount under a 128 token segmentation (75% on gpt-4.1).

It would be possible for a backend to persist a prior k-v cache context window with the response id, but that is not what OpenAI offers you. In close analysis of latency trials, one also discovers that there is commonality in cacheable performance when below the threshold to receive a discount, but no discount for you.

Instructions are also input tokens, and altering instructions, which are the first message you can control, would be cache-breaking.

Workaround would be a “threshold” API parameter that actually decides something between the maximum billing or an error when sending a model a million tokens. OpenAI offers no length management. Or simply don’t use the technology, use Chat Completions.

Rishi_Mehra · June 23, 2025, 12:57pm

Hi, I hope you’re doing well. I have a question regarding the use of the previous_response_id parameter in OpenAI.

If I use previous_response_id, will it significantly impact the cost?
Is there an optimized way to pass conversation history manually?
I want to understand — does using previous_response_id reduce the cost compared to when we used to manually pass the entire conversation history in each API call?
Also, does the previous_response_id parameter work with the Chat Completions API method and in Structured Data Parsing as well?

aprendendo.next · June 23, 2025, 1:08pm

it doesn’t affect cost, it just helps in managing conversation by not having to handle the input history all the time.
completions API doesn’t support previous_response_id. Both API’s support structured output, if that’s what you mean.

Rishi_Mehra · June 23, 2025, 1:24pm

Thanks and could you please let me know if there is any limit to how much conversation history is retained when using the previous_response_id parameter?

_j · June 23, 2025, 1:36pm

The stored response is just a way for the entire chat to be resent - without any interface for self-management, without any value you can pass for the budget you want to spend.

The limit is that all conversation is retained, nothing is discarded until you would exceed the total context window of the AI model that you are using. Then you get an API error unless you switch to sending “truncation”: “auto” as an API parameter, which finally allows the chat to continue, only discarding turns at the model’s maximum, not at any setting you can send to limit your cost per turn.

aprendendo.next · June 23, 2025, 1:52pm

Basically, the chosen model’s context limits of input tokens for the accumulated conversation history.

What previous_response_id does is just a backward retrieval of the previous requests to reconstruct a new one internally, but if the chosen model can’t handle the context window, it will fail like it would in any normal request.

saumyanayak · October 25, 2025, 3:56pm

Okay but what if instead of passing previous_response_id i pass the conversation id as mentioned in this - Using the Conversations API section of this link https://platform.openai.com/docs/guides/conversation-state#using-the-conversations-api

What happens internally in this case, in terms of context and billing?

aprendendo.next · October 25, 2025, 4:00pm

It is the same, but instead it will use the stored conversation as inputs.

There is a new endpoint now that can calculate how many input tokens should be billed, without actually doing the request. Notice it will not calculate output tokens though.

Topic		Replies	Views
My experience switching from Assistants API to Responses API Feedback assistants-api	50	13453	September 3, 2025
About System Message on Response API API gpt5 , responses-api , conversation	5	893	November 15, 2025
Getting ChatGPT to Remember Previous Chat Messages Prompting	37	70843	January 29, 2024
Migrating Assistant to Responses - long thread management API	2	777	May 13, 2025
A conversation using the API API	6	3172	December 16, 2023

Responses API: Question about managing conversation state with previous_response_id

Related topics