TL;DR: I’m running GPT-5 with tools. The model issues a function_call, I execute the tool, then I send back a function_call_output using the same call_id. Despite that, my next request often fails with 400: “No tool output found for function call .” It looks like a state/timing issue around previous_response_id. Looking for robust patterns that work in production.
Setup / context
-
Model: GPT-5 with high reasoning effort (
reasoning.effort: "high"), tools enabled. -
Flow: model calls a tool → I run the tool → I post back
function_call_outputwith the exactcall_id→ I continue the conversation. -
Observed error: On the next
responses.create(...), I frequently get:
400: "No tool output found for function call <ID>"even thoughcall_idmatches 1:1.
Minimal repro (simplified)
resp = client.responses.create(model="gpt-5", tools=tools, input=messages)
messages += resp.output # contains the function_call with call_id
tool_result = run_tool(...)
messages.append({
"type": "function_call_output",
"call_id": call_id_from_model,
"output": json.dumps(tool_result),
})
resp2 = client.responses.create(
model="gpt-5",
tools=tools,
input=messages,
previous_response_id=resp.id, # -> often triggers 400: "No tool output found..."
)
What it feels like
This behaves like a state/race condition in the Responses API when previous_response_id is used; despite sending the correct function_call_output, the session sometimes doesn’t “see” it on the following request.
Workarounds I’ve tried/seen
-
Manual message management (no
previous_response_id)
Carry the full chat history yourself. Append both the model’sfunction_calland yourfunction_call_outputas prior messages, then callresponses.create(...)withoutprevious_response_id. This seems to avoid the error for many cases. -
Streaming strategy
Stream the model’s response and fulfill tool calls “inline,” feeding outputs back during the same streamed exchange. More complex to implement, but it reduces reliance on follow-up requests. -
Lower reasoning / simpler calls as fallback
If high-effort reasoning correlates with the issue in your stack, temporarily fall back to a simpler mode as a safety net.
Questions for the community
-
Do you have a battle-tested pattern for Responses API + tools that consistently avoids this error?
-
Are you running without
previous_response_id(pure manual message history) in production—and is it stable? -
Any success stories with the streaming approach as a permanent solution?
-
Other defensive patterns to guarantee the model always “hears” the tool output?
Thanks in advance for any code samples, dos/don’ts, or architectures that worked for you!