Better performance from reasoning models using the Responses API (cookbook)

I am unclear about when one can reuse the reasoning output and when not.

In the cookbook one builds a context using a first request to the API.

Is that how we are supposed to do it in production: make a first request which returns a tool call then make another request to the API with the first request history + response? or is the history including reasoning output saved in state and we only need to pass the message_id of the response including the reasoning output as context?

1 Like

include Responses API parameter:

Specify additional output data to include in the model response. Currently supported values are:

  • reasoning.encrypted_content: Includes an encrypted version of reasoning tokens in reasoning item outputs. This enables reasoning items to be used in multi-turn conversations when using the Responses API statelessly (like when the store parameter is set to false, or when an organization is enrolled in the zero data retention program).
    ..

Then for returning what the assistant produced, it is not “assistant” that gets constructed, but one of many “item” types, needing a dozen expansions in the API reference:

“ID” requires “store” even if not using the stateful server-side response.

Then it is a question not about “when” but about “why”. What is the benefit of a bunch of model thinking and deliberating consuming context length on later inputs? It’s been demonstrated on multi-step reasoning chain-of-thought, one can even strip the start of the funnel of knowledge while it is being generated and still get good output.

Reasoning is not tool calls, which MUST be preserved in chat history for an extended length. Otherwise you could train an AI to answer without using them or have a conversation that looks like the user request was not satisfied by real-world action.