Compact a Response with Previous Response ID

jim · January 25, 2026, 1:46pm

While integrating compaction into our workflow, we encountered some ambiguity. The docs tell us to send the entirety of the conversation so far as input for the compaction, yet there is also a previous_response_idavailable on the request. We assumed we could simply hit the /compactendpoint with that id and get the compacted response object, but the docs are unclear:

The unique ID of the previous response to the model. Use this to create multi-turn conversations. Learn more about conversation state. Cannot be used in conjunction with conversation.

What does it means to use this to create “multi-turn conversations”? In this context, wouldn’t it make more sense to simply use the cached response of that previous id as input for the compaction?

_j · January 25, 2026, 2:07pm

This endpoint is essentially:

Run messages through it
Pay AI
Get back smaller undescribed encrypted content

It is essentially for self-management of conversation, but a platform lock-in of self-management if you use that mechanism and discard the original conversation.

It cannot affect a conversation state server-side, unless you create a new conversation ID, where it seems you can pass a compaction: https://platform.openai.com/docs/api-reference/conversations/create#conversations_create-items-item-compaction_item (no inclination to try…)

It is indeed something you’d want to do only when the cache has already been broken, such as with an idled conversation after any cache would time out, or the context absolutely can’t be run due to length.

A previous response ID as conversation state doesn’t do much to encourage better cache; it is just a different way to send messages, and in fact, if you are using “truncation”:“auto”, it is the API deciding when to discard messages and break the cache on you.

aprendendo.next · January 25, 2026, 2:11pm

I assume it was a (poor) “copy paste” from the create endpoint parameter. Definitely could use a better wording.

Anyways, I think what they really meant was that you can use the previous_response_id parameter instead of passing the whole input again, that would be it.

_j · January 25, 2026, 2:19pm

Yes, you can have it produce upon an input ID of a previous response.

One might think it could then modify that response ID. Retrieve and see if the messages are gone and it is not a permanently stateful ID?

Then you’d have to run that compaction as one turn of conversation input, getting pretty close to “why don’t I do this all myself”.

One other curiosity: You’d think that a response ID is a discrete state of the input that was run. But no. You will find instead that a response ID is a chain of references to previous response IDs. Delete one of the earlier IDs (if you are collecting them all for a chat session like you could just collect all the messages) - you then have a conversation where the input history stops where the chain is broken. Perhaps an easier “compaction” to discard some chaining, killing old IDs on demand, when you feel like it is a good time to truncate - as long as it keeps working that way.

jim · January 25, 2026, 6:35pm

Oh, that’s what I was thinking too. Which was the main purpose of my question: does it work this way? Has anyone successfully used it with that previous response id?

jim · January 25, 2026, 6:49pm

I know. I started to reconsider this again after reading their recent blog post about how they run Codex. https://openai.com/index/unrolling-the-codex-agent-loop/

I used to manage it all myself, but really appreciated the convenience of the previous response id.

Might have to rethink everything yet again.

aprendendo.next · January 25, 2026, 7:31pm

I just tried, seems ok to me:

# 1) First response
r1 = client.responses.create(
    model="gpt-5-mini",
    input="Give me a 2-sentence overview of Rust's ownership model.",
)
print(r1.output_text)

# 2) Reply using previous_response_id
r2 = client.responses.create(
    model="gpt-5-mini",
    previous_response_id=r1.id,
    input="Now add a short example in pseudocode.",
)
print(r2.output_text)

# 3) Compact using previous_response_id (no input)
compacted = client.responses.compact(
    model="gpt-5-mini",
    previous_response_id=r2.id,
)
print(compacted.output)
# 4) Follow-up response using compacted output as input
r3 = client.responses.create(
    model="gpt-5-mini",
    input=[
        *compacted.output,
        {"role": "user", "content": "Summarize this conversation."},
    ],
)
print(r3.id)
print(r3.output_text)

I guess the advantage is not losing reasoning or managing encrypted responses when using stateless mode.

jim · January 26, 2026, 2:04am

Just reporting back, seems to work as @aprendendo.next reported, so really appreciate it. Threw me off when the object back was just inputs. I know it says it in the docs, but thought maybe the compactions would be in-between inputs - instead it was just the last object (encrypted object).

OAI Logs in Dashboard also not reporting compaction threw me off a bit - only reporting back a string of inputs.

On a follow-up request, referred to info “lost” in the encrypted info, and it came back 100% accurate. Will be interesting to see if there is a true savings here given our current approach (using previous_response_id). Not apparent given this first test, and would love more data from OAI on it, but again thanks for the help.

_j · January 26, 2026, 4:43am

The main thing to consider: scheduling a compaction run powered by AI doesn’t really fit any pattern of “excellence”.

Are you going to do it preemptively just in case someone doesn’t abandon a chat but instead revisits it? “Paying it, forward”.
Are you going to make someone wait while an AI generates thousands of tokens that are not thinking or a response?
Are you switching models with different context lengths? Can you know when to do it if you are switching between gpt-4.1 (1M) and gpt-4o (<128k)? There’s no “here’s how much to keep, or don’t drop anything at all” parameter.
Are you going to schedule it with knowledge of when the cache is already broken by timeout, broken on a particular model or not existing on another? Are you using server-side conversation storage exclusively, needing to retrieve messages and count tokens to even inform your calls to this endpoint?

The main thing I would do, since you cannot observe the quality delivered: only pass the oldest turns for compaction. Then you’d still potentially destroy “here’s the code base we’ll be discussing this whole chat” inputs with summation.

aprendendo.next · January 26, 2026, 9:47am

Just in case anyone comes looking into this topic later, here is the section in the docs that explains how it works:

https://platform.openai.com/docs/guides/conversation-state#compaction-advanced

Compaction (advanced)

For long-running conversations with the Responses API, you can use the /responses/compactendpoint to shrink the context you send with each turn.

Compaction is stateless: you send the full window to the endpoint, and it returns a compacted window that you provide in the next /responses call.

All prior user messages are kept verbatim.

Prior assistant messages, tool calls, tool results, and encrypted reasoning are replaced with a single encrypted compaction item that preserves the model’s latent understanding while remaining opaque and ZDR-compatible.

Usage flow

Send Responses requests as usual with user messages, assistant replies, and tool interactions.

When the context window grows large, call /responses/compact with the full window (it must still fit within the model’s max context size).

Use the returned compacted window as the input for the next /responses request and continue the workflow.

Instructions (optional)

The instructions field lets you include a system-style message that applies only to the compaction request. We recommend using this field only if you also supply instructions when creating responses, and ensuring that the same instructions are passed to both the Responses and Compact endpoints.

Topic		Replies	Views
Responses API: Question about managing conversation state with previous_response_id API responses-endpoint	22	7374	October 31, 2025
Migrating Assistant to Responses - long thread management API	2	734	May 13, 2025
Codex OpenAI Completions → Responses Migration Pack API api , completions , responses , responses-api , codex-cli	3	985	November 3, 2025
My experience switching from Assistants API to Responses API Feedback assistants-api	50	12955	September 3, 2025
New: Responses API feature - Conversation state API (thread-like replacement) API	4	761	August 28, 2025

Compact a Response with Previous Response ID

Compaction (advanced)

Related topics