We think we’ve found a bug in the Responses API around built-in client tool_search.
The issue doesn’t seem to be “tool_search always breaks” or “tool_search can only be used once.” It looks more specific than that:
- the model calls tool_search
- we reply with tool_search_output
- that output loads a mix of:
- a normal top-level function tool
- a namespaced tool
- later in the same conversation, we include tool_search again
At that point, the follow-up requests start returning HTTP 500.
What makes this look like an API bug is that the same follow-up works fine if we remove tool_search from the request.
What we observed
For the same conversation state and same follow-up input:
- tools=[dummy] -> succeeds
- tools=[tool_search] -> returns 500
- tools=[dummy, tool_search] -> also returns 500
Controls we tried:
- if the earlier tool_search_output loads only a top-level function, no repro
- if it loads only a namespaced tool, no repro
- if it loads only flat functions, no repro
- if it loads one top-level function plus one namespaced tool, repro
So the problem seems to be specifically tied to this state transition:
1. tool_search loads a mixed tool registry
2. a later continuation includes tool_search again
Environment
- Model: gpt-5.4
- Endpoint: POST /v1/responses
- Feature: built-in tool_search with execution: “client”
Why this matters
We use tool_search for deferred client-side tool loading, because different users/accounts/phases expose different tool sets. Reusing tool_search later in the same conversation is a natural part of that flow.
Right now it looks like once tool_search_output has loaded both:
- a top-level function tool, and
- a namespaced tool
the conversation can enter a bad state where adding tool_search again causes a server error.
Minimal repro script
import json
import http.client
import os
import sys
API_KEY = os.environ.get("OPENAI_API_KEY")
if not API_KEY:
print("Set OPENAI_API_KEY")
sys.exit(1)
MODEL = "gpt-5.4"
TOOL_SEARCH = {
"type": "tool_search",
"execution": "client",
"description": "Search for available tools",
"parameters": {
"type": "object",
"properties": {"goal": {"type": "string"}},
"required": ["goal"],
"additionalProperties": False,
},
}
DUMMY = {
"type": "function",
"name": "dummy",
"description": "Dummy tool.",
"parameters": {
"type": "object",
"properties": {},
"additionalProperties": False,
},
}
LOADED_REGISTRY = [
{
"type": "function",
"name": "f1",
"description": "Load more tools.",
"parameters": {
"type": "object",
"properties": {"goal": {"type": "string"}},
"required": ["goal"],
"additionalProperties": False,
},
},
{
"type": "namespace",
"name": "n1",
"description": "PDF tools.",
"tools": [
{
"type": "function",
"name": "f2",
"description": "Read PDF.",
"parameters": {
"type": "object",
"properties": {"path": {"type": "string"}},
"required": ["path"],
"additionalProperties": False,
},
}
],
},
]
def req(body):
conn = http.client.HTTPSConnection("api.openai.com", timeout=120)
conn.request(
"POST",
"/v1/responses",
body=json.dumps(body),
headers={
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
},
)
resp = conn.getresponse()
data = resp.read().decode("utf-8")
request_id = resp.getheader("x-request-id")
conn.close()
try:
parsed = json.loads(data)
except Exception:
parsed = {"raw": data}
return resp.status, request_id, parsed
for i in range(5):
status1, rid1, r1 = req({
"model": MODEL,
"store": True,
"input": "Search for the tools you need.",
"tools": [TOOL_SEARCH],
"parallel_tool_calls": False,
})
tool_search_call = next(
o for o in r1["output"] if o["type"] == "tool_search_call"
)
status2, rid2, r2 = req({
"model": MODEL,
"store": True,
"previous_response_id": r1["id"],
"input": [{
"type": "tool_search_output",
"execution": "client",
"call_id": tool_search_call["call_id"],
"status": "completed",
"tools": LOADED_REGISTRY,
}],
"tools": [TOOL_SEARCH],
"parallel_tool_calls": False,
})
for label, tools in [
("dummy_only", [DUMMY]),
("tool_search_only", [TOOL_SEARCH]),
]:
status3, rid3, r3 = req({
"model": MODEL,
"store": True,
"previous_response_id": r2["id"],
"input": "Use one of the loaded function tools now.",
"tools": tools,
"parallel_tool_calls": False,
})
out_types = {}
for item in r3.get("output", []):
out_types[item["type"]] = out_types.get(item["type"], 0) + 1
print(i + 1, label, status3, rid3, out_types)
if status3 != 200:
print(" error:", r3.get("error", {}).get("message"))
Observed output
- dummy_only: 5/5 success
- tool_search_only: 0/5 success, all 500
Sample failed request IDs:
- req_17e3eca5901d44ea9a13a28620d57b7e
- req_89f87691b9954181a8bbaf4002409182
- req_3fc527d744004d8195284519f4bcea90
- req_889351c2abfa49eea68c6314d8ba393e
- req_333e8ad40d844a689da6d16aaea03795
This doesn’t look related to context size. The repro is small. It looks more like a bad interaction between:
- prior tool_search_output state
- loading both top-level and namespaced tools
- then using tool_search again later in the same conversation
Thanks for looking folks. Happy to provide more details, or more information as required. Also, if we’re just holding it wrong please let us know!
Thanks for the great APIs! <3