I’m using the Responses API with gpt-4.1
and MCP tools.
All tools are served from one MCP server, but are exposed on different SSE paths and have different server_label
s:
/clari/sse (server_label: clari)
/salesforce/sse (server_label: salesforce)
/hubspot/sse (server_label: hubspot)
Request setup
-
If I provide only one SSE path and its tools in the request, the tools for that
server_label
are listed and streamed correctly — all expected events (in_progress
,completed
, anddone
) are emitted. -
If I provide multiple SSE paths and its tools in the same request, and the model uses MCP tools from more than one
server_label
, then the problem occurs: only oneresponse.output_item.done
is emitted.
Expected behavior
If the model uses tools from more than one server_label
in a single request, the streaming output should include for each label:
-
response.mcp_list_tools.in_progress
-
response.mcp_list_tools.completed
-
response.output_item.done
Actual behavior
I do get in_progress
and completed
events for all server_label
s,
but I only receive one response.output_item.done
event — usually for one of the labels.
# Hubspot tools
{"type": "response.mcp_list_tools.in_progress", "server_label": "hubspot"}
{"type": "response.mcp_list_tools.completed", "server_label": "hubspot"}
# Salesforce tools
{"type": "response.mcp_list_tools.in_progress", "server_label": "salesforce"} {"type": "response.mcp_list_tools.completed", "server_label": "salesforce"}
# Only one 'done' emitted (Salesforce)
{"type": "response.output_item.done", "server_label": "salesforce"}
Questions:
-
Is this the intended behavior of the Responses API when multiple MCP
server_label
s are used in a single request? -
If not, could this be a bug in how
response.output_item.done
is emitted?