Responses API: Question about managing conversation state with previous_response_id

I’ve been tinkering and want to make sure I’m tracking correctly. It appears that managing conversation state on the server side by passing the
previous_response_id is limited to only the last message?

Am I missing something? I’m looking for a way to extend this functionality such that I could, for example, pass a single parameter that would fetch the entire history of the conversation, not just the last message.

It seems to get the whole thing. I tried it just now:

Hello! This is a test of whether my script’s history is working. In a few turns, I’ll ask you what the magic word is (it’s banana). Don’t mention it until then!

Got it! I’ll wait for you to ask.

In the meantime, how are you today?

I’m just a program, but I’m here and ready to help! How are you doing?

Doing OK I guess.

I’m glad to hear that! If there’s anything you’d like to chat about or need help with, feel free to let me know.

What was the magic word?

The magic word is banana.

4 Likes

Interesting. I was testing last night and did not get this result. My test case was a little more convoluted though. I’ll have to try this out the way you did. Thank you!

1 Like

Here’s the code I used just in case.

from openai import OpenAI
api_key = "REDACTED"
client = OpenAI(api_key = api_key)
last_id = None
while True:
	user_input = input("\n> ")
	if user_input.lower() in ["exit", "quit"]:
		break
	if last_id:
		response = client.responses.create(model = "gpt-4o", input = user_input, previous_response_id = last_id)
	else:
		response = client.responses.create(model = "gpt-4o", input = user_input)
	last_id = response.id
	print("\n" + response.output_text)
1 Like

Thanks! I was able to replicate this on my end so indeed, it appears to be grabbing the whole history. Appreciate your assistance.

2 Likes

Are all previous tokens in the conversation charged as input tokens in this case? (ie, same as manually managing state with the completions API)

3 Likes

What if I want not the whole history but only last n messages? How to do this?

3 Likes

Are all previous tokens in the conversation charged as input tokens in this case?

The docs say “Even when using previous_response_id, all previous input tokens for responses in the chain are billed as input tokens in the API.”

2 Likes

I suppose eventually it should support truncation strategies similar to the Assistants API?

2 Likes

Hey man thanks for responding in this thread.

Im trying to wrap my head around this, but struggling.

What i cant understand is the context of the input tokens being passed into each turn using previous_response_id.

for example, in my use case i am using file search in the assistants api. for every turn this passes chunks received from retrieval + user message.

now if in response api previous_response_id account for EVERY input, does that mean each retrieval input is all inputted. If so, that’s ridiculous.

Any help would be great please guys!

1 Like

Ok so i think i get it, but It would be good to know how openai are handling it.

So if you want consistent memory as a thread.

  1. Initial call ( first message)
  2. get response (store the previous_response_id)
  3. create second call and add the previous_response_id.

and then so forth.

It must be a chain, so each turn you must give the last (previous_response_id).

It would be good to know how openai link all those responses together.

If i am completely honest, having a thread_id is a lot more simple. Can anyone see any benefits in the above workflow, rather than having a thread id.

Also it would be good to know if the “instructions” is used as tokens on each turn.

Also from testing i can see how this can get very messy in the api dashboard under logs. Its very difficult to follow a conversation.

4 Likes

Indeed, on each call we have to provide a previous_response_id to act as a conversation — but this increase tokens usage so much… A first response which use ~3000 tokens goes to ~15000 tokens after a few turns… Are those charged or cached and not charged?
I’m also wondering if instructions use tokens each turn.
It becomes very expensive in my case because I use RAG to provide context as system message which are linked in previous response on each turn which increase token usage so much..
Workaround could be to store conversation and conversation messages and give N last messages as assistant/user messages, but doing this make previous_response_id irrelevant — So I’m not sure it should be the way to go.
Anyone has faced those issues?

1 Like

Yes, you have discovered that Responses is not production grade. Neither its server-side chat state nor its internal tools are ultimately useful.

Repeated inputs still would not be “free”, as your input context just grows and grows.

The cache discount that is offered is that for over 1024 tokens, if routed to the same server, if not expired out in a 5-60 minute service window, you can receive a 50% discount on that input that is in common, the amount under a 128 token segmentation (75% on gpt-4.1).

It would be possible for a backend to persist a prior k-v cache context window with the response id, but that is not what OpenAI offers you. In close analysis of latency trials, one also discovers that there is commonality in cacheable performance when below the threshold to receive a discount, but no discount for you.

Instructions are also input tokens, and altering instructions, which are the first message you can control, would be cache-breaking.

Workaround would be a “threshold” API parameter that actually decides something between the maximum billing or an error when sending a model a million tokens. OpenAI offers no length management. Or simply don’t use the technology, use Chat Completions.

1 Like

Hi, I hope you’re doing well. I have a question regarding the use of the previous_response_id parameter in OpenAI.

  1. If I use previous_response_id, will it significantly impact the cost?
    Is there an optimized way to pass conversation history manually?
    I want to understand — does using previous_response_id reduce the cost compared to when we used to manually pass the entire conversation history in each API call?
  2. Also, does the previous_response_id parameter work with the Chat Completions API method and in Structured Data Parsing as well?
1 Like
  1. it doesn’t affect cost, it just helps in managing conversation by not having to handle the input history all the time.
  2. completions API doesn’t support previous_response_id. Both API’s support structured output, if that’s what you mean.
2 Likes

Thanks and could you please let me know if there is any limit to how much conversation history is retained when using the previous_response_id parameter?

1 Like

The stored response is just a way for the entire chat to be resent - without any interface for self-management, without any value you can pass for the budget you want to spend.

The limit is that all conversation is retained, nothing is discarded until you would exceed the total context window of the AI model that you are using. Then you get an API error unless you switch to sending “truncation”: “auto” as an API parameter, which finally allows the chat to continue, only discarding turns at the model’s maximum, not at any setting you can send to limit your cost per turn.

1 Like

Basically, the chosen model’s context limits of input tokens for the accumulated conversation history.

What previous_response_id does is just a backward retrieval of the previous requests to reconstruct a new one internally, but if the chosen model can’t handle the context window, it will fail like it would in any normal request.

1 Like