Responses API 500 when tool_search is used again after loading both top-level and namespaced tools

We think we’ve found a bug in the Responses API around built-in client tool_search.

The issue doesn’t seem to be “tool_search always breaks” or “tool_search can only be used once.” It looks more specific than that:

- the model calls tool_search

- we reply with tool_search_output

- that output loads a mix of:

- a normal top-level function tool

- a namespaced tool

- later in the same conversation, we include tool_search again

At that point, the follow-up requests start returning HTTP 500.

What makes this look like an API bug is that the same follow-up works fine if we remove tool_search from the request.

What we observed

For the same conversation state and same follow-up input:

- tools=[dummy] -> succeeds

- tools=[tool_search] -> returns 500

- tools=[dummy, tool_search] -> also returns 500

Controls we tried:

- if the earlier tool_search_output loads only a top-level function, no repro

- if it loads only a namespaced tool, no repro

- if it loads only flat functions, no repro

- if it loads one top-level function plus one namespaced tool, repro

So the problem seems to be specifically tied to this state transition:

1. tool_search loads a mixed tool registry

2. a later continuation includes tool_search again

Environment

- Model: gpt-5.4

- Endpoint: POST /v1/responses

- Feature: built-in tool_search with execution: “client”

Why this matters

We use tool_search for deferred client-side tool loading, because different users/accounts/phases expose different tool sets. Reusing tool_search later in the same conversation is a natural part of that flow.

Right now it looks like once tool_search_output has loaded both:

- a top-level function tool, and

- a namespaced tool

the conversation can enter a bad state where adding tool_search again causes a server error.

Minimal repro script

import json
import http.client
import os
import sys

API_KEY = os.environ.get("OPENAI_API_KEY")
if not API_KEY:
    print("Set OPENAI_API_KEY")
    sys.exit(1)

MODEL = "gpt-5.4"

TOOL_SEARCH = {
    "type": "tool_search",
    "execution": "client",
    "description": "Search for available tools",
    "parameters": {
        "type": "object",
        "properties": {"goal": {"type": "string"}},
        "required": ["goal"],
        "additionalProperties": False,
    },
}

DUMMY = {
    "type": "function",
    "name": "dummy",
    "description": "Dummy tool.",
    "parameters": {
        "type": "object",
        "properties": {},
        "additionalProperties": False,
    },
}

LOADED_REGISTRY = [
    {
        "type": "function",
        "name": "f1",
        "description": "Load more tools.",
        "parameters": {
            "type": "object",
            "properties": {"goal": {"type": "string"}},
            "required": ["goal"],
            "additionalProperties": False,
        },
    },
    {
        "type": "namespace",
        "name": "n1",
        "description": "PDF tools.",
        "tools": [
            {
                "type": "function",
                "name": "f2",
                "description": "Read PDF.",
                "parameters": {
                    "type": "object",
                    "properties": {"path": {"type": "string"}},
                    "required": ["path"],
                    "additionalProperties": False,
                },
            }
        ],
    },
]

def req(body):
    conn = http.client.HTTPSConnection("api.openai.com", timeout=120)
    conn.request(
        "POST",
        "/v1/responses",
        body=json.dumps(body),
        headers={
            "Authorization": f"Bearer {API_KEY}",
            "Content-Type": "application/json",
        },
    )
    resp = conn.getresponse()
    data = resp.read().decode("utf-8")
    request_id = resp.getheader("x-request-id")
    conn.close()
    try:
        parsed = json.loads(data)
    except Exception:
        parsed = {"raw": data}
    return resp.status, request_id, parsed

for i in range(5):
    status1, rid1, r1 = req({
        "model": MODEL,
        "store": True,
        "input": "Search for the tools you need.",
        "tools": [TOOL_SEARCH],
        "parallel_tool_calls": False,
    })
    tool_search_call = next(
        o for o in r1["output"] if o["type"] == "tool_search_call"
    )

    status2, rid2, r2 = req({
        "model": MODEL,
        "store": True,
        "previous_response_id": r1["id"],
        "input": [{
            "type": "tool_search_output",
            "execution": "client",
            "call_id": tool_search_call["call_id"],
            "status": "completed",
            "tools": LOADED_REGISTRY,
        }],
        "tools": [TOOL_SEARCH],
        "parallel_tool_calls": False,
    })

    for label, tools in [
        ("dummy_only", [DUMMY]),
        ("tool_search_only", [TOOL_SEARCH]),
    ]:
        status3, rid3, r3 = req({
            "model": MODEL,
            "store": True,
            "previous_response_id": r2["id"],
            "input": "Use one of the loaded function tools now.",
            "tools": tools,
            "parallel_tool_calls": False,
        })
        out_types = {}
        for item in r3.get("output", []):
            out_types[item["type"]] = out_types.get(item["type"], 0) + 1
        print(i + 1, label, status3, rid3, out_types)
        if status3 != 200:
            print("  error:", r3.get("error", {}).get("message"))



Observed output

- dummy_only: 5/5 success

- tool_search_only: 0/5 success, all 500

Sample failed request IDs:

- req_17e3eca5901d44ea9a13a28620d57b7e

- req_89f87691b9954181a8bbaf4002409182

- req_3fc527d744004d8195284519f4bcea90

- req_889351c2abfa49eea68c6314d8ba393e

- req_333e8ad40d844a689da6d16aaea03795

This doesn’t look related to context size. The repro is small. It looks more like a bad interaction between:

- prior tool_search_output state

- loading both top-level and namespaced tools

- then using tool_search again later in the same conversation

Thanks for looking folks. Happy to provide more details, or more information as required. Also, if we’re just holding it wrong please let us know!

Thanks for the great APIs! <3