Let me re-supply this in a more approachable manner for the forum (and the word “summation”):
Step 1: Tracking IDs
Keep track of every previous_response_id
for each message from OpenAI’s Responses API.
Step 2: Summary Generation
When you reach a certain number (e.g., every 100 messages), perform the following:
-
Use the previous_response_id
from an earlier conversation point (e.g., message #60) to prompt OpenAI explicitly for a detailed summary of the conversation history up to that point.
(This leverages OpenAI’s memory of messages 1–60 implicitly.)
-
Store this summary separately as Summary60
.
Step 3: Context Reset
Next, gather messages #61–100 (both user and assistant responses), explicitly forming a structured JSON message history.
Step 4: Breaking the Chain
For the next message (message #101):
- Do not supply any
previous_response_id
.
Instead, send OpenAI:
- The freshly obtained
Summary60
- The structured JSON history for messages #61–100
- The user’s new question (#101)
This explicitly breaks the previous ID chain, starting a fresh context with controlled summation.
Step 5: Resume Standard Chaining
From message #102 onwards:
- Resume using
previous_response_id
normally, chaining from the newly obtained response (message #101).
- Continue this until the message count again reaches 100 messages from the new reset point, then repeat the summation and reset process again.
Who’s giving people 100 messages? 
It seems like it could work. You shouldn’t be completely blocked from re-creations like you are with Assistants. Work, but not entirely practical. OpenAI could just offer a “truncate at token count” option (that necessarily doesn’t strip system messages) - instead of their own limit set at 1 million tokens, that by default simply returns an error if you exceed the model context length.
This looks like doing lots of work on your part, that would degrade the user experience. It would mean delaying a response to the user with all this backend that would happen occasionally. Needing: Handling a complete list of messages, retrieving multiple inputs, along with ensuring the latest AI output in a separate call. Being system-message and instruction-aware all the time, what did you use? Re-shipping things like file search results back into more messages. Or instead, to not impact experience: running maintenance on a conversation that might never be revisited by a user.
All that message manipulation, instead of merely having your own backend for messages? The endpoints below (by function) are what you’d deal with, and that’s not even describing the returned object itself or paginating through lists that can’t be called in parallel:
def get_openai_response(response_id: str) -> dict:
"""
Sends a GET request to the OpenAI API for "output" of response object.
- GET https://api.openai.com/v1/responses/resp_1234?include[]=message.input_image.image_url&include[]=computer_call_output.output.image_url&include[]=file_search_call.results
Returns the parsed JSON response, a single AI output.
Can have "include" query parameter: Specify additional output data to include in the model response. Currently supported values are:
file_search_call.results: Include the search results of the file search tool call.
message.input_image.image_url: Include image urls from the input message.
computer_call_output.output.image_url: Include image urls from the computer call output.
reasoning.encrypted_content: Includes an encrypted version of reasoning tokens in reasoning item outputs. This enables reasoning items to be used in multi-turn conversations when using the Responses API statelessly (like when the store parameter is set to false, or when an organization is enrolled in the zero data retention program).
"""
def get_openai_response_input(response_id: str) -> dict:
"""
Sends a GET request to the OpenAI API for "input" list of response object as "data".
- GET https://api.openai.com/v1/responses/resp_12345/input_items?include[]=message.input_image.image_url&include[]=computer_call_output.output.image_url&include[]=file_search_call.results
Returns the parsed JSON response, a list of inputs and no AI output.
Can have "include" query parameter: Specify additional output data to include in the model response. Currently supported values are:
file_search_call.results: Include the search results of the file search tool call.
message.input_image.image_url: Include image urls from the input message.
computer_call_output.output.image_url: Include image urls from the computer call output.
reasoning.encrypted_content: Includes an encrypted version of reasoning tokens in reasoning item outputs. This enables reasoning items to be used in multi-turn conversations when using the Responses API statelessly (like when the store parameter is set to false, or when an organization is enrolled in the zero data retention program).
Other query parameters:
after - An item ID to list items after, used in pagination.
before - An item ID to list items before, used in pagination.
limit - (Defaults to 20) A limit on the number of objects to be returned. Limit can range between 1 and 100, and the default is 20.
order - (asc | desc) The order to return the input items in. Default is asc.
"""
Now: why would OpenAI not give an effective truncation strategy for conversation length that could limit your billing?
One possible reason is that Responses is informed by the possibility of a caching architecture for input reuse, for internal cost efficiency. A reuse of a response ID should ease automatic caching. However, with technology that requires identical starting inputs, that demands deprioritizing a developer’s desire to manage input length themselves.