[OpenAI Responses API] “No tool output found for function call” when using previous_response_id — anyone have a stable workaround?

TL;DR: I’m running GPT-5 with tools. The model issues a function_call, I execute the tool, then I send back a function_call_output using the same call_id. Despite that, my next request often fails with 400: “No tool output found for function call .” It looks like a state/timing issue around previous_response_id. Looking for robust patterns that work in production.

Setup / context

  • Model: GPT-5 with high reasoning effort (reasoning.effort: "high"), tools enabled.

  • Flow: model calls a tool → I run the tool → I post back function_call_output with the exact call_id → I continue the conversation.

  • Observed error: On the next responses.create(...), I frequently get:
    400: "No tool output found for function call <ID>" even though call_id matches 1:1.

Minimal repro (simplified)

resp = client.responses.create(model="gpt-5", tools=tools, input=messages)
messages += resp.output  # contains the function_call with call_id

tool_result = run_tool(...)
messages.append({
    "type": "function_call_output",
    "call_id": call_id_from_model,
    "output": json.dumps(tool_result),
})

resp2 = client.responses.create(
    model="gpt-5",
    tools=tools,
    input=messages,
    previous_response_id=resp.id,  # -> often triggers 400: "No tool output found..."
)

What it feels like

This behaves like a state/race condition in the Responses API when previous_response_id is used; despite sending the correct function_call_output, the session sometimes doesn’t “see” it on the following request.

Workarounds I’ve tried/seen

  1. Manual message management (no previous_response_id)
    Carry the full chat history yourself. Append both the model’s function_call and your function_call_output as prior messages, then call responses.create(...) without previous_response_id. This seems to avoid the error for many cases.

  2. Streaming strategy
    Stream the model’s response and fulfill tool calls “inline,” feeding outputs back during the same streamed exchange. More complex to implement, but it reduces reliance on follow-up requests.

  3. Lower reasoning / simpler calls as fallback
    If high-effort reasoning correlates with the issue in your stack, temporarily fall back to a simpler mode as a safety net.

Questions for the community

  • Do you have a battle-tested pattern for Responses API + tools that consistently avoids this error?

  • Are you running without previous_response_id (pure manual message history) in production—and is it stable?

  • Any success stories with the streaming approach as a permanent solution?

  • Other defensive patterns to guarantee the model always “hears” the tool output?

Thanks in advance for any code samples, dos/don’ts, or architectures that worked for you!


1 Like

I’m running into the exact same issue (using streaming, tool calls in combination with GPT5). For me, lowering the reasoning effort to ‘Low‘ fixed the issue, but would like to see a stable solution in the long run as well in case I would need to use higher reasoning efforts! (I’m using previous_response_id)

Something that might help is setting parallel tool calls to false:

[parallel_tool_calls]
boolean or null Optional Defaults to true Whether to allow the model to run tool calls in parallel.

Because when you enable to run tools in parallel (default) you will have to collect and submit all the called tool outputs together in one shot and submit all the outputs in one message. Just tested this and now I don’t get the error message anymore.

1 Like