New tools for building agents: Responses API, web search, file search, computer use, and Agents SDK

Noteworthy: the server-side “conversation state” feature introduced with the responses API: if you use this feature, by passing a previous_response_id along with only the latest input, it has no cost management to the chat length:

You get only:

  • maximum loading of context window (pay 120k tokens a turn?), or
  • an error thrown.

If you get into the situation where it is automatically discarding oldest turns, it is also destroying any context window caching with every new input.

truncation: The truncation strategy to use for the model response.

* `auto`: If the context of this response and previous ones exceeds the model's context window size, the model will truncate the response to fit the context window by dropping input items in the middle of the conversation.
* `disabled` (default): If a model response will exceed the context window size for a model, the request will fail with a 400 error.

Thus, this conversation state is not practical to use in its current implementation unless you want to make chat sessions themselves of limited turn count length to users, or will set money on fire.


Additional: "store": false must be set by you when doing your own chat self-management as before, otherwise you get the server-side storage of chat session under a multitude of model response IDs as the default, consuming resources and perhaps response time.

Todo: compare network latency of 10-200kb chat requests to the latency of the backend response_id retrieval when this is being more heavily utilized - like assistants.