Responses API: Parallel Tool Calls Not Happening

I’m experimenting with the new Responses API, but I’m only ever seeing one function call per response, even though I’ve set "parallel_tool_calls": true.

My Responses API payload

{
  "input": [
    {
      "content": [
        { "text": "Can you check my tasks and notes on HubSpot?", "type": "input_text" }
      ],
      "role": "user"
    }
  ],
  "model": "gpt-4o",
  "instructions": "....",
  "parallel_tool_calls": true,
  "store": false,
  "user": "...",
  "tool_choice": "auto",
  "tools": [
    { "type": "web_search_preview" },
    {
      "name": "Tool_Nango_Hubspot_OwnerTaskList",
      "description": "Get my HubSpot tasks",
      "parameters": {...}
    },
    {
      "name": "Tool_Nango_Hubspot_OwnerNoteList",
      "description": "Get my HubSpot notes",
      "parameters": {...}
    }
  ]
}

What I get back

{
  "output": [
    {
      "type": "function_call",
      "name": "Tool_Nango_Hubspot_OwnerTaskList",
      "arguments": {
        "action_description": "Retrieving the most recent tasks for from HubSpot.",
        "object_type": "task",
        "sort_by": "hs_lastmodifieddate"
      }
    }
  ],
  "parallel_tool_calls": true,
  …
}

What I expected

• Both Tool_Nango_Hubspot_OwnerTaskList and Tool_Nango_Hubspot_OwnerNoteList to be emitted in the same Responses API reply, so I can execute them in parallel on my end.

Actual behavior

• Only the Tool_Nango_Hubspot_OwnerTaskList tool is ever called; the assistant stops before calling the Tool_Nango_Hubspot_OwnerNoteList tool.


Questions

  1. Does the Responses API currently support true parallel function invocation in GPT‑4o?

  2. Are there additional flags, instruction formats, or tool‑ordering requirements I’m missing?

  3. Any known limitations or best practices for getting multiple function calls in a single Responses API response?

Thanks in advance for any insights!

1 Like

You might be missing that non-stream output is a list that needs to be iterated over. You can’t simply grab [0] and expect full contents there.

Correct tool placement

Descriptive non-strict function

{
  "name": "weather_conditions",
  "description": "current weather. Supports parallel call by placing in multi_tool_use",
  "strict": false,
  "parameters": {
    "type": "object",
    "required": [
      "location_city"
    ],
    "properties": {
      "location_city": {
        "type": "string"
      }
    },
    "additionalProperties": false
  }
}

Multiple need, multiple parallel call

Need to iterate demonstrated

>>> response.output[0]
...                     
ResponseFunctionToolCall(arguments='{"location_city":"San Francisco"}', call_id='call_EV6RlpSglyqjLBBkdmTL0gKC', name='weather_conditions', type='function_call', id='fc_67f54a025e988192ab0d8dbd537a54de0e33792bb0a1c99e', status='completed')

>>> response.output[1]
...                     
ResponseFunctionToolCall(arguments='{"location_city":"San Jose"}', call_id='call_QeZa7OjZKHWrhN2uJxdTXi7x', name='weather_conditions', type='function_call', id='fc_67f54a029c9c81929db97c6e9d21cb330e33792bb0a1c99e', status='completed')


I have set strict: false on each of the tools. I am not just looking at output[0], in what I included above you can see it’s the full output list.

I’m also using the golang sdk.

Then it is just down to model quality and instruction-following. I didn’t have to use the parallel parameter, as that is only for disabling the internal tool and saving you some tokens and some error-prone usage.

If you are getting tool use, and a pattern that understands the need to call them iteratively otherwise, you can even mandate in the function description that function cannot be used directly but is only to be used by parallel placement - sent to the parallel method of multi_tool_use tool recipient, to be more literal against what the AI sees internally.

Here’s the JSON body of the success stimulation above.

{
  "model": "gpt-4o",
  "input": [
    {
      "role": "system",
      "content": [
        {
          "type": "input_text",
          "text": "You are weatherpal"
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "input_text",
          "text": "What's the temperature difference between SFO and San Jose?"
        }
      ]
    }
  ],
  "text": {
    "format": {
      "type": "text"
    }
  },
  "reasoning": {},
  "tools": [
    {
      "type": "function",
      "name": "weather_conditions",
      "description": "current weather. Supports parallel call by placing in multi_tool_use",
      "parameters": {
        "type": "object",
        "required": [
          "location_city"
        ],
        "properties": {
          "location_city": {
            "type": "string"
          }
        },
        "additionalProperties": false
      },
      "strict": false
    }
  ],
  "temperature": 1,
  "max_output_tokens": 2048,
  "top_p": 1,
  "store": false
}

Which is easy when you haven’t got tons of tokens of web results or file search results and your instruction-following quality is damaged by system message injection.

Interesting. When using the chat completions API with the same tool names, descriptions, and message input text, it does call the tools in parallel, so I’m curious as to why the responses API doesn’t behave the same.

1 Like

Let’s look at context placement as a cause:

Responses

## multi_tool_use

// This tool serves as a wrapper for utilizing multiple tools. Each tool that can be used must be specified in the tool sections. Only tools in the functions namespace are permitted.
// Ensure that the parameters provided to each tool are valid according to that tool's specification.
namespace multi_tool_use {

// Use this function to run multiple tools simultaneously, but only if they can operate in parallel. Do this even if the prompt suggests using the tools sequentially.
type parallel = (_: {
// The tools to be executed in parallel. NOTE: only functions tools are permitted
tool_uses: {
// The name of the tool to use. The format should either be just the name of the tool, or in the format namespace.function_name for plugin and function tools.
recipient_name: string,
// The parameters to pass to the tool. Ensure these are valid according to the tool's own specifications.
parameters: object,
}[],
}) => any;

} // namespace multi_tool_use

Chat Completions

## multi_tool_use

// This tool serves as a wrapper for utilizing multiple tools. Each tool that can be used must be specified in the tool sections. Only tools in the functions namespace are permitted.
// Ensure that the parameters provided to each tool are valid according to that tool's specification.
namespace multi_tool_use {

// Use this function to run multiple tools simultaneously, but only if they can operate in parallel. Do this even if the prompt suggests using the tools sequentially.
type parallel = (_: {
// The tools to be executed in parallel. NOTE: only functions tools are permitted
tool_uses: {
// The name of the tool to use. The format should either be just the name of the tool, or in the format namespace.function_name for plugin and function tools.
recipient_name: string,
// The parameters to pass to the tool. Ensure these are valid according to the tool's own specifications.
parameters: object,
}[],
}) => any;

} // namespace multi_tool_use

Tool for sending in parallel seems the same.

And the function?

Wait: what’s this??

    {
      "name": "Tool_Nango_Hubspot_OwnerTaskList",
      "description": "Get my HubSpot tasks",
      "parameters": {...}
    },

Reminder: Responses has a different function format than chat completions. They cannot drop in. Five requirements at the top level of each object for you.

    {
      "type": "function",
      "name": "Tool_Nango_Hubspot_OwnerTaskList",
      "description": "Retrieves users tasks. Send in Parallel with any other non-dependent Hubspot function call.",
      "strict": false,
      "parameters": {...
1 Like

Sorry when I was crafting the message I was trying to reduce the payload to make it easier to read, but I missed those two keys in the function definitions. Here’s the full payload for the request:

{
  "input": [
    {
      "content": [
        {
          "text": "Can you load up my hubspot tasks and notes?",
          "type": "input_text"
        }
      ],
      "role": "user"
    }
  ],
  "model": "gpt-4o",
  "instructions": "...",
  "parallel_tool_calls": true,
  "store": false,
  "user": "...",
  "tool_choice": "auto",
  "tools": [
    {
      "type": "web_search_preview"
    },
    {
      "name": "Tool_Nango_Hubspot_OwnerTaskList",
      "parameters": {
        "properties": {
          "action_description": {
            "description": "A brief description of the action to be performed.",
            "type": "string"
          },
          "object_type": {
            "description": "The type of object. Must be 'task' for this tool.",
            "enum": [
              "task"
            ],
            "type": "string"
          },
          "sort_by": {
            "description": "Property to sort the results by.",
            "enum": [
              "hs_lastmodifieddate",
              "hs_createdate"
            ],
            "type": "string"
          }
        },
        "required": [
          "action_description",
          "object_type",
          "sort_by"
        ],
        "type": "object"
      },
      "strict": false,
      "description": "Retrieves the 25 most recent tasks owned by the authenticated user in HubSpot. Use this tool to get tasks assigned to you.",
      "type": "function"
    },
    {
      "name": "Tool_Nango_Hubspot_OwnerNoteList",
      "parameters": {
        "properties": {
          "action_description": {
            "description": "A brief description of the action to be performed.",
            "type": "string"
          },
          "object_type": {
            "description": "The type of object. Must be 'note' for this tool.",
            "enum": [
              "note"
            ],
            "type": "string"
          },
          "sort_by": {
            "description": "Property to sort the results by.",
            "enum": [
              "hs_lastmodifieddate",
              "hs_createdate"
            ],
            "type": "string"
          }
        },
        "required": [
          "action_description",
          "object_type",
          "sort_by"
        ],
        "type": "object"
      },
      "strict": false,
      "description": "Retrieves the 25 most recent notes owned by the authenticated user in HubSpot. Use this tool to get notes created by you.",
      "type": "function"
    }
  ]
}

I just had a thought here - web search may be forcing strict on itself and producing a token enforcement. The same strict you can’t use yourself if you want parallel calls emitted.

Plus, the output of web search basically damages what follows.

You might have to make a choice, save yourself tokens that can never be employed and simply turn off the parallel tool, or make web search a function that you can be in control of.

Ah ok. I thought I tried removing the web_search tool already and still go the same behavior, but I just tried it again, and the parallel tool calling does seem to be working now when the web search tool is not included in the list of tools. Do you think this is by design? Or a bug?

2 Likes

I am often lost for OpenAI’s motivations in general…

Web search is not available on chat completions as a tool, so that give us an immediate cause for difference, both in placed context and its returns clouding the distance back to tool definitions. Also that we cannot compare to anything that does work.

If it absolutely cannot be overcome then it would appear to be a case of one tool in your list, out of your control, having the impression of “strict”:true to go along with it.

First have a tool that can’t be answered by a web search:

Run again with web search tool also added…

Answer received.

2 Likes

That makes sense, thanks for your help!

I’m not sure if this is a similar situation, but I’m also having some trouble getting multiple tools to run in parallel.

class TextOutput(BaseModel):
    name: str
    bio: str

text_agent = Agent(name=“text_agent”, instructions=“You are a biographer”, output_type=TextOutput)

class ImageOutput(BaseModel):
    url: str

image_agent = Agent(name=“image_agent”, instructions=“You are an image generator”, output_type=ImageOutput, tools=[custom_image_generator_tool])

triage_agent = Agent(name=“triage_agent”, instructions=“You are a triage agent and yada yada yada”, tools=[text_agent.as_tool(), image_agent.as_tool()], model_settings=ModelSettings(parallel_tool_calls=True, tool_choice="required",))

I use run_streamed and when I check for “tool_call_output_item” event I am expecting that the text_agent returns it’s output pretty quick and the image_agent takes awhile. However, both take awhile and return their output nearly simultaneously about 45 seconds later.

Looking at my trace, both tool calls trigger almost right away, but both take like 45 seconds to respond. Setting parallel_tool_calls=True and not setting it at all both have no effect.

When I remove the image_agent from the triage tools list, the text_agent tool output returns in like 2 seconds.

You seem to leave it as an exercise to the reader to figure out what you are doing with your Pydantic class objects.

However I can answer in general what you might be experiencing.

Parallel tool calls require all the responses to be collected before they can be submitted back to the AI model. You must have matching IDs between the AI’s tool call output list and the return.

Parallel thus would be an accelerator for a task like “generate five of your secure API keys for me”, returning in the two seconds that each would take instead of iterating sequentially through all five for ten seconds.

A long running tool use in parallel would be dominated by the time waiting on the most time-consuming tool.

Web and File search both prevent custom functions from getting called in parallel over Responses APIs (at least when using stream=True requests to AsyncOpenAI.responses.create())

If you see this OpenAI a fix would be great! Latency and cost quickly add up when the model makes 2 or 3 or more non-parallel tool calls

2 Likes

I’m facing the same issue and have already tried the approach you mentioned but no luck

Edited:
I discovered that it doesn’t work when using file_search.

Also, the Beta AssistantV2 API handles parallel tool calling well, even when file_search is enabled.

1 Like

right, the issue is OpenAI wants us to move from Assistants API to Responses API asap and will be deprecating Assistants soon…

This is the terminology they use, but lets face it, it’s de facto deprecated already.

1 Like

I am seeing the same. parallel_tool_calls and max_tool_calls seems to be completely ignored if the web_search_preview is included along with function definitions. No matter the instructions or user prompt if web_search_preview is active only one function call will be made at a time. Otherwise parallel functions calls work correctly.
This makes the web search feature unusable for us and also caused some delay in debugging the unexpected interaction and researching luckily finding this thread (thanks alec7 & _j)

1 Like