Responses API: Parallel Tool Calls Not Happening

Then it is just down to model quality and instruction-following. I didn’t have to use the parallel parameter, as that is only for disabling the internal tool and saving you some tokens and some error-prone usage.

If you are getting tool use, and a pattern that understands the need to call them iteratively otherwise, you can even mandate in the function description that function cannot be used directly but is only to be used by parallel placement - sent to the parallel method of multi_tool_use tool recipient, to be more literal against what the AI sees internally.

Here’s the JSON body of the success stimulation above.

{
  "model": "gpt-4o",
  "input": [
    {
      "role": "system",
      "content": [
        {
          "type": "input_text",
          "text": "You are weatherpal"
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "input_text",
          "text": "What's the temperature difference between SFO and San Jose?"
        }
      ]
    }
  ],
  "text": {
    "format": {
      "type": "text"
    }
  },
  "reasoning": {},
  "tools": [
    {
      "type": "function",
      "name": "weather_conditions",
      "description": "current weather. Supports parallel call by placing in multi_tool_use",
      "parameters": {
        "type": "object",
        "required": [
          "location_city"
        ],
        "properties": {
          "location_city": {
            "type": "string"
          }
        },
        "additionalProperties": false
      },
      "strict": false
    }
  ],
  "temperature": 1,
  "max_output_tokens": 2048,
  "top_p": 1,
  "store": false
}

Which is easy when you haven’t got tons of tokens of web results or file search results and your instruction-following quality is damaged by system message injection.