Hi, I’m migrating from the Assistants API to the Prompts (Responses) API.
With Assistants I used to set truncation_strategy={"type": "last_messages", "last_messages": N} in the runs, but I see that responses only supports two options (auto and disabled).
Is there any way to achieve the same behavior with the responses API?
This is an issue with the endpoint that I identified and put forth within the hour of its announcement.
That has not changed.
The closest you could get is worse than managing the entire chat history and inputs to the AI model yourself, which is to list the items of a “conversation” that you use as server-side storage instead of the previous response ID. Then you must query that conversation with all options for listing internal objects, such as reasoning (summary) turns and tool calls, and go about deleting and altering that list of items as your source of truth about the conversation by trying to prune it for a particular budget.
So your choice: ‘disabled’: running the model up to where the $0.50 input per call or even per tool call finally produces an error, or “auto”, server side turn-based truncation damages the cache every input and you pay full price every re-run of input at the maximum.
The only choice is still self-management - if simply stopping chats at a maximum number of turns or cost of the last call as your user interface is not an option.