Responses API only emits response.output_item.done for one server_label when multiple tools from different server_labels are used in a single response

I’m using the Responses API with gpt-4.1 and MCP tools.
All tools are served from one MCP server, but are exposed on different SSE paths and have different server_labels:

/clari/sse (server_label: clari)

/salesforce/sse (server_label: salesforce)

/hubspot/sse (server_label: hubspot)

Request setup

  • If I provide only one SSE path and its tools in the request, the tools for that server_label are listed and streamed correctly — all expected events (in_progress, completed, and done) are emitted.

  • If I provide multiple SSE paths and its tools in the same request, and the model uses MCP tools from more than one server_label, then the problem occurs: only one response.output_item.done is emitted.

Expected behavior
If the model uses tools from more than one server_label in a single request, the streaming output should include for each label:

  • response.mcp_list_tools.in_progress

  • response.mcp_list_tools.completed

  • response.output_item.done

Actual behavior
I do get in_progress and completed events for all server_labels,
but I only receive one response.output_item.done event — usually for one of the labels.

# Hubspot tools

{"type": "response.mcp_list_tools.in_progress", "server_label": "hubspot"}

{"type": "response.mcp_list_tools.completed", "server_label": "hubspot"}

# Salesforce tools

{"type": "response.mcp_list_tools.in_progress", "server_label": "salesforce"} {"type": "response.mcp_list_tools.completed", "server_label": "salesforce"}

# Only one 'done' emitted (Salesforce)

{"type": "response.output_item.done", "server_label": "salesforce"}

Questions:

  1. Is this the intended behavior of the Responses API when multiple MCP server_labels are used in a single request?

  2. If not, could this be a bug in how response.output_item.done is emitted?