Model tries to call unknown function multi_tool_use.parallel

About 7.5% of all our API calls to gpt-4o result in multi_tool_use.parallel. Some of them could be unpacked with code we had to add as workaround.

To extract actual tool calls:

  1. Parse arguments JSON object
  2. Iterate over tool_uses array to call functions:
  3. get function name from recipient_name value
  4. parameters will be JSON object for function arguments

Would be great to see the fix on OpenAI API side, so we do not have to monitor weird failures and manually patch responses.

Here are the examples and how we correct them: Learnings · paulz/ai-apps-reliability Wiki · GitHub

1 Like

1.35 Python adds the explicit option do disable parallel calling.

Hi guys, interesting discussion. Anyone tested describe to the model how to try to call tools in parallel?

Here is what I would do:

  1. Add all tools on the same server (through gateway) on distinct endpoints but keeping the request structures similar not to confuse the model
  2. In the description of each tool I would add a note about whether the tool can be called within a multi tool call
  3. As a “multi tool call” endpoint with detailed definition of how to call multiple tools simultaneously with examples
  4. Add detailed workflows descriptions to system prompt explaining how and why calling multiple tools simultaneously.

Run tests to see what it would do to optimize performance…

Let me know what you think and if someone already tested this approach.

I encountered it today so this bug is still around as of today.

I am betting that this weirdness is a result of some artifacts of training - which was later discarded.

Fortunately, it’s a consistent pattern. I wrote a simple work-around in Rust that has been working for months without issue:

#[derive(Deserialize, Debug, Clone)]
pub struct GPTHallcuinatedFunctionCall {
    pub tool_uses: Vec<HallucinatedToolCalls>
}

#[derive(Deserialize, Debug, Clone)]
pub struct HallucinatedToolCalls {
    pub recipient_name: String,
    pub parameters: Value
}
// Sometimes GPT the decides to wrap the tools in a `multi_tool_use.parallel`
    //{  tool_calls: [ToolCall {
    //  id: "call_y38JQQUmdYjTbYdJ3dIgAdFR",
    //  call_type: "function",
    //  function: Function {
    // name: "multi_tool_use.parallel",
    // arguments: "{\"tool_uses\":[{\"recipient_name\":\"functions.submit_document\",\"parameters\":{}}]}"
    //  }
    //}
    // Search through `tool_request` and determine if any name is "multi_tool_use.parallel"
    for tool in tool_requests {
        let function = &tool.function;
        if function.name == "multi_tool_use.parallel" {
            // Caught
            // We need to deseralize the arguments
            let caught_calls = serde_json::from_str::<GPTHallcuinatedFunctionCall>(&function.arguments).unwrap();
            let tool_uses = caught_calls.tool_uses;

            for tool_use in tool_uses {
                let tool = ToolCall {
                    id: tool.id.clone(),
                    call_type: tool.call_type.clone(),
                    function: Function {
                        name: tool_use
                            .recipient_name
                            .clone()
                            .rsplit('.')
                            .next()
                            .unwrap()
                            .to_string(),
                        arguments: serde_json::to_string(&tool_use.parameters).unwrap(),
                    },
                };
                tools_requested.push(tool);

            }

The idea is that the actual arguments are found nested inside of the arguments properties.

I asked Claude to convert this code into Python:

from typing import List
from dataclasses import dataclass
import json

@dataclass
class GPTHallucinatedFunctionCall:
    tool_uses: List['HallucinatedToolCalls']

@dataclass
class HallucinatedToolCalls:
    recipient_name: str
    parameters: dict

@dataclass
class Function:
    name: str
    arguments: str

@dataclass
class ToolCall:
    id: str
    call_type: str
    function: Function

def process_tool_requests(tool_requests):
    tools_requested = []

    for tool in tool_requests:
        function = tool.function
        if function.name == "multi_tool_use.parallel":
            # We need to deserialize the arguments
            caught_calls = json.loads(function.arguments, object_hook=lambda d: GPTHallucinatedFunctionCall(**d))
            tool_uses = caught_calls.tool_uses

            for tool_use in tool_uses:
                new_tool = ToolCall(
                    id=tool.id,
                    call_type=tool.call_type,
                    function=Function(
                        name=tool_use.recipient_name.rsplit('.', 1)[-1],
                        arguments=json.dumps(tool_use.parameters)
                    )
                )
                tools_requested.append(new_tool)

    return tools_requested

And of course a unit test to ensure that it’s working (I did not test the Python code). It just prints out the result. Lazy, I know.

    #[test]
    fn test_openai_be_like_iTs_pARAlLeL_guIsE(){

        let request2 = vec![
            ToolCall {
                id: "call_Vdmu1Lo2A7GXN82xAIvK1vHk".to_string(), 
                call_type: "function".to_string(), 
                function: Function { 
                    name: "multi_tool_use.parallel".to_string(),
                    arguments: "{\"tool_uses\":[{\"recipient_name\":\"functions.patch_properties\",\"parameters\":{\"corrected_properties\":[{\"key\":\"is_refund\",\"updated_value\":\"true\"}]}},{\"recipient_name\":\"functions.submit_document\",\"parameters\":{}}]}".to_string()
                } 
            }
        ];

        let result = handle_tool_request(&request2).unwrap();

        println!("Result: {:#?}", result);
    }

I think Claude actually made it a workable unit test during conversion for Python. Nice(?)

import unittest
import json
from typing import List
from dataclasses import dataclass

# Assuming the previous code is in a file named gpt_function_call.py
from gpt_function_call import ToolCall, Function, process_tool_requests

class TestOpenAIParallelFunction(unittest.TestCase):
    def test_openai_be_like_its_parallel_guise(self):
        request2 = [
            ToolCall(
                id="call_Vdmu1Lo2A7GXN82xAIvK1vHk",
                call_type="function",
                function=Function(
                    name="multi_tool_use.parallel",
                    arguments=json.dumps({
                        "tool_uses": [
                            {
                                "recipient_name": "functions.patch_properties",
                                "parameters": {
                                    "corrected_properties": [
                                        {
                                            "key": "is_refund",
                                            "updated_value": "true"
                                        }
                                    ]
                                }
                            },
                            {
                                "recipient_name": "functions.submit_document",
                                "parameters": {}
                            }
                        ]
                    })
                )
            )
        ]

        result = process_tool_requests(request2)

        # Add assertions to check the result
        self.assertEqual(len(result), 2)
        
        self.assertEqual(result[0].function.name, "patch_properties")
        self.assertEqual(json.loads(result[0].function.arguments), {
            "corrected_properties": [{"key": "is_refund", "updated_value": "true"}]
        })
        
        self.assertEqual(result[1].function.name, "submit_document")
        self.assertEqual(json.loads(result[1].function.arguments), {})

        # Print the result for debugging
        print("Result:", result)

if __name__ == '__main__':
    unittest.main()

I got this error today with gpt-4o. Issue still around (and it seems not one cares based on the age of this)
Should this be reported somewhere else ?

1 Like

strong text[date=2024-08-28 time=12:44:00 timezone=“Africa/Johannesburg”]

Is this really a bug or a feature to allow multi-tool, parallel function calling?

It is a model quality issue. The AI is sending its output to the wrong tool, writing the output for a different internal tool into functions.