Migrating Assistant to Responses - long thread management

I’m planning to migrate my codebase from the OpenAI Assistants API to the newer Responses API, and I need to develop a robust strategy for managing thread conversation history, especially for long-running conversations. I’m considering the following approach—do you think this might be a viable strategy?

Step 1: Tracking IDs
Keep track of every previous_response_id for each message from OpenAI's Responses API.

Step 2: Summary Generation
When you reach a certain number (e.g., every 100 messages), perform the following:

* Use the previous_response_id from an earlier conversation point (e.g., message #60) to prompt OpenAI explicitly for a detailed summary of the conversation history up to that point. (This leverages OpenAI's memory of messages 1–60 implicitly.)

* Store this summary separately as Summary60.

Step 3: Context Reset
Next, gather messages #61–100 (both user and assistant responses), explicitly forming a structured JSON message history.

Step 4: Breaking the Chain
For the next message (message #101):

* Do not supply any previous_response_id.

Instead, send OpenAI:
* The freshly obtained Summary60
* The structured JSON history for messages #61–100
* The user's new question (#101)
* This explicitly breaks the previous ID chain, starting a fresh context with controlled summarisation.

Step 5: Resume Standard Chaining
From message #102 onwards:
* Resume using previous_response_id normally, chaining from the newly obtained response (message #101).
* Continue this until the message count again reaches 100 messages from the new reset point, then repeat the summarisation and reset process again.

Look forward to thoughts people might have? It would probably be nice if the API had some sort of sliding window approach where outside the window it used a summary automatically.

1 Like

Let me re-supply this in a more approachable manner for the forum (and the word “summation”):

Step 1: Tracking IDs

Keep track of every previous_response_id for each message from OpenAI’s Responses API.


Step 2: Summary Generation

When you reach a certain number (e.g., every 100 messages), perform the following:

  • Use the previous_response_id from an earlier conversation point (e.g., message #60) to prompt OpenAI explicitly for a detailed summary of the conversation history up to that point.
    (This leverages OpenAI’s memory of messages 1–60 implicitly.)

  • Store this summary separately as Summary60.


Step 3: Context Reset

Next, gather messages #61–100 (both user and assistant responses), explicitly forming a structured JSON message history.


Step 4: Breaking the Chain

For the next message (message #101):

  • Do not supply any previous_response_id.

Instead, send OpenAI:

  • The freshly obtained Summary60
  • The structured JSON history for messages #61–100
  • The user’s new question (#101)

This explicitly breaks the previous ID chain, starting a fresh context with controlled summation.


Step 5: Resume Standard Chaining

From message #102 onwards:

  • Resume using previous_response_id normally, chaining from the newly obtained response (message #101).
  • Continue this until the message count again reaches 100 messages from the new reset point, then repeat the summation and reset process again.

Who’s giving people 100 messages? :slight_smile:

It seems like it could work. You shouldn’t be completely blocked from re-creations like you are with Assistants. Work, but not entirely practical. OpenAI could just offer a “truncate at token count” option (that necessarily doesn’t strip system messages) - instead of their own limit set at 1 million tokens, that by default simply returns an error if you exceed the model context length.

This looks like doing lots of work on your part, that would degrade the user experience. It would mean delaying a response to the user with all this backend that would happen occasionally. Needing: Handling a complete list of messages, retrieving multiple inputs, along with ensuring the latest AI output in a separate call. Being system-message and instruction-aware all the time, what did you use? Re-shipping things like file search results back into more messages. Or instead, to not impact experience: running maintenance on a conversation that might never be revisited by a user.

All that message manipulation, instead of merely having your own backend for messages? The endpoints below (by function) are what you’d deal with, and that’s not even describing the returned object itself or paginating through lists that can’t be called in parallel:

def get_openai_response(response_id: str) -> dict:
    """
    Sends a GET request to the OpenAI API for "output" of response object.
    - GET https://api.openai.com/v1/responses/resp_1234?include[]=message.input_image.image_url&include[]=computer_call_output.output.image_url&include[]=file_search_call.results
    Returns the parsed JSON response, a single AI output.
    Can have "include" query parameter: Specify additional output data to include in the model response. Currently supported values are:
        file_search_call.results: Include the search results of the file search tool call.
        message.input_image.image_url: Include image urls from the input message.
        computer_call_output.output.image_url: Include image urls from the computer call output.
        reasoning.encrypted_content: Includes an encrypted version of reasoning tokens in reasoning item outputs. This enables reasoning items to be used in multi-turn conversations when using the Responses API statelessly (like when the store parameter is set to false, or when an organization is enrolled in the zero data retention program).
    """
def get_openai_response_input(response_id: str) -> dict:
    """
    Sends a GET request to the OpenAI API for "input" list of response object as "data".
    - GET https://api.openai.com/v1/responses/resp_12345/input_items?include[]=message.input_image.image_url&include[]=computer_call_output.output.image_url&include[]=file_search_call.results

    Returns the parsed JSON response, a list of inputs and no AI output.

    Can have "include" query parameter: Specify additional output data to include in the model response. Currently supported values are:
        file_search_call.results: Include the search results of the file search tool call.
        message.input_image.image_url: Include image urls from the input message.
        computer_call_output.output.image_url: Include image urls from the computer call output.
        reasoning.encrypted_content: Includes an encrypted version of reasoning tokens in reasoning item outputs. This enables reasoning items to be used in multi-turn conversations when using the Responses API statelessly (like when the store parameter is set to false, or when an organization is enrolled in the zero data retention program).

    Other query parameters:
        after - An item ID to list items after, used in pagination.
        before - An item ID to list items before, used in pagination.
        limit - (Defaults to 20) A limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.
        order - (asc | desc) The order to return the input items in. Default is asc.
    """

Now: why would OpenAI not give an effective truncation strategy for conversation length that could limit your billing?

One possible reason is that Responses is informed by the possibility of a caching architecture for input reuse, for internal cost efficiency. A reuse of a response ID should ease automatic caching. However, with technology that requires identical starting inputs, that demands deprioritizing a developer’s desire to manage input length themselves.

1 Like

Thanks for the formatting, that code block I posted into was pretty unreadable. Yeah 100 is too many, ideally I’d use a token count rather than number of messages, but there’s not easy way to know how many tokens the previous message chain is using (when you consider file uploads etc). In reality it would probably be 10-20 messages. This is for an intranet chatGPT client where staff are more likely to come back to a thread each day and just add another question to the ongoing thread.

Appreciate your input.