API reference - Responses, truncation
Issue: the description makes no sense. And frankly, neither does the parameter name.
truncation
- string or null, optional, defaults to disabledThe truncation strategy to use for the model response.
auto
: If the context of this response and previous ones exceeds the model’s context window size, the model will truncate the response to fit the context window by dropping input items in the middle of the conversation.disabled
(default): If a model response will exceed the context window size for a model, the request will fail with a 400 error.
Truncate: the end is clipped or removed.
The endpoint itself will return a truncated output if AI writes up to the end of the context window length without finishing.
What does this parameter do though? The confused writer tells us “the model will truncate the response to fit”. Wrong. Cut off the output as a feature?
What is the intent?
In fact, “auto” is supposed to discard some in-the-middle chat turns, to keep the input below a token length that would cause an error, as the “auto” behavior. Default is to error out, but we don’t know exactly when it errors out, if it is when there is 20 or 2000 tokens left after your input.
Thus: what is that input threshold of how much is tossed, and how much remains against a model context window length specification? Some context window reservation must be left for forming the output - how much remaining memory for a response is “not” input? The max response length of the model, the sent max_response_tokens
? More for a reasoning model, or by its reasoning effort parameter?
Then, does it act on only discarding stored input, or can I send 1M of many turns in input and that also will be affected? What is the behavior of this “in the middle”. Docs don’t tell. Only that the default behavior is to maximize the input billed, also without explaining the threshold where a limit is reached and error is returned by default (123k of 125k).
I’d try answer, but that would take work and exploration and inference based on probing the API at large inputs.
Stored response persistence
Documentation says:
Response objects are saved for 30 days by default. They can be viewed in the dashboard logs page or retrieved via the API. You can disable this behavior by setting
store
tofalse
when creating a Response.
Evidence in the Responses logs is contrary, suggesting that “default” so far is “forever”…
I have more I could write, but there’s nobody reading, apparently.