Conversations API - When does pruning happen?

bsamuele · September 10, 2025, 8:16am

Hello,

I am dealing with a chatbot application that uses the responses API.
For conversation state, I am using the conversation API instead of passing the previous_response_id param: I create a conversation per user, once, and pass it as the conversation param to the client.responses.create method.

My question is: given that I couldn’t find anything about conversation pruning in the docs, am I supposed to manually manage the pruning my calling delete-item on older items in a conversation?

If I don’t, will the context given to the LLM just keep increasing as new messages are added until the context window limit of the model is reached, or is there any other type of pruning employed?

The point is that my application doesn’t provide explicit conversation management to the user (just a single thread; no “new chat” button - although it wouldn’t be a problem for the user to eventually lose access to very old messages, as they move out of some “time window”), so I’m a bit scared at the idea of just keeping a single conversation object per user forever.

What’s the best way to go about this in my case?

Thank you

guptaa.pavan · September 11, 2025, 11:23pm

Every I am having similar issue, So what I did was to delete the conversation by setting in redis for 15 min. When redis key expries I will trigger delete of converstion

_j · September 11, 2025, 11:38pm

No mechanism is offered to fit conversation length to a budget. This is a shortcoming that makes server-side conversation state on Responses unsuitable for use, either by reuse of a response ID or by a “conversation”.

There is a parameter “truncation”. It decides if you get an error message if the conversation is larger than the model input context, or if instead, only at the maximum, are there turns dropped based on the model.

If OpenAI can provide their truncation method with the context window length of the model, they can darn-well provide it with your own parameter for the maximum input. They don’t, though.

Manually deleting messages is data loss. If you want a conversation to run on gpt-4, you’d have to prune the input plus the latest input down to 6k in order for room to form a response. Then you switch to gpt-4.1, and you’ve still deleted the one million tokens of input you could be billed on that model.

The best case is not to use the offered server-side conversation state, until they come up with a budget setting that is also cache-aware: for determining which turns are candidates for advancing a cutoff pointer in large increments, by model, by expiry time, by cache discount.

If you want to use a conversation method, there is not even Assistants endpoint’s maximum number of turns threshold.

A lot of wasted effort to not deliver practicality.

gchesnel · September 12, 2025, 8:21am

There is an automatic process for deleting messages after 30 days

There is also an option for automatically truncating large responses, disabled or auto (when exceeding the Model’s Window Size). This feature is a good start but is limited. For instance, I would like to have more control over the type of messages that are truncated - tool messages are kept and can be quite large.
I believe the AI Agent API offers more options in relation to this.

_j · September 12, 2025, 10:46am

There is no automatic process happening for deleting messages. They are still there since the launch of Responses.

There is no “AI Agent API”.

gchesnel · September 12, 2025, 11:19am

This is an excerpt from the Response API document - Response Object section

truncation

string or null

The truncation strategy to use for the model response.

auto: If the input to this Response exceeds the model’s context window size, the model will truncate the response to fit the context window by dropping items from the beginning of the conversation.

disabled (default): If the input size will exceed the context window size for a model, the request will fail with a 400 error.

And from the Microsoft Learn site - Response API - Delete Response section

By default response data is retained for 30 days.

Sorry links can be pasted

BAKHAT_ALI · September 12, 2025, 11:47pm

bsamuele:

Hello,

I am dealing with a chatbot application that uses the responses API.
For conversation state, I am using the conversation API instead of passing the previous_response_id param: I create a conversation per user, once, and pass it as the conversation param to the client.responses.create method.

My question is: given that I couldn’t find anything about conversation pruning in the docs, am I supposed to manually manage the pruning my calling delete-item on older items in a conversation?

If I don’t, will the context given to the LLM just keep increasing as new messages are added until the context window limit of the model is reached, or is there any other type of pruning employed?

The point is that my application doesn’t provide explicit conversation management to the user (just a single thread; no “new chat” button - although it wouldn’t be a problem for the user to eventually lose access to very old messages, as they move out of some “time window”), so I’m a bit scared at the idea of just keeping a single conversation object per user forever.

What’s the best way to go about this in my case?

Thank you

You’ll need to prune the conversation yourself.

When you keep a single conversation id, OpenAI stores every turn. The model will only see content that fits its context window—older text is dropped at runtime—but the stored conversation keeps growing.

Best practice:

Keep a small rolling window and summarize older parts.
Delete unneeded items with conversation.item.delete.
Optionally set truncation: "auto" as a safety net.
Use store: false for exchanges you don’t need to keep.

This keeps token usage, cost, and latency predictable and avoids uncontrolled growth of long-running chats.

_j · September 13, 2025, 1:04am

Sorry, ChatGPT, but based on your answer provided by only repeating what was already seen in this forum topic, an incorrect assumption was made.

You would not want to damage a conversation that may need retrieval for customer presentation. You would not want to damage a conversation that may be run against different context length models.

What you would want is an effective “truncation” interface that does better than OpenAI, taking any kind of parameter short of running a million old chat tokens per turn, per internal iteration.

Topic		Replies	Views
Data retention for model response - need clarification Documentation responses-endpoint	3	156	September 2, 2025
Documentation issues: Responses endpoint storage, storage persistence, truncation Documentation bug , openai-documentation	12	409	June 5, 2025
Responses API: Question about managing conversation state with previous_response_id API responses-endpoint	17	4291	June 23, 2025
New: Responses API feature - Conversation state API (thread-like replacement) API	4	243	August 28, 2025
Conversation access with previous_response_id: storage time API responses-endpoint	1	916	March 14, 2025

Conversations API - When does pruning happen?

Related topics