Ridiculous Number of Redundant Tool Calls

hannamtrey · April 1, 2025, 9:56pm

I have been working on a text to SQL prompt and the LLM made 73 tool calls in one request and 70 were the EXCACT same call. The LLM should only need one call per user question. I am curious if there are any ideas of what is going on… my best guess is something in the prompt confusing the LLM

Here is the process flow:

Take a user question and create an SQL statement
Use a tool call to run the SQL in the DB

Tool definition

tools = [
            {
                "type": "function",
                "function": {
                    "name": "_run_query",
                    "description": "Run SQL in database. Will return a list of dictionaries where each element represents a row.",
                    "strict": True,
                    "parameters": {
                        "type": "object",
                        "properties": {
                        "sql_query": {
                            "type": "string",
                            "description": "syntactically correct SQL statement"
                        }
                        },
                        "additionalProperties": False,
                        "required": ["sql_query"]
                    }
                }
            }
        ]

Model Call

client.chat.completions.create(
                model="gpt-4o-mini",
                messages=messages,
                temperature=0,
                seed=20,
                tools=tools,
                tool_choice={"type": "function", "function": {"name": "_run_query"}},
                parallel_tool_calls= False
            )

Turn the result into a natural language answer

Here’s an example:

user_question = "How many apples have been sold?"


response = client.chat.completions.create(
                model="gpt-4o-mini",
                messages=[
                {"role": "system", "content" :"You are a SQL expert..."}, # the actual prompt is about 100 lines
                {"role": "user", "content" :user_question}
],
                temperature=0,
                seed=20,
                tools=tools,
                tool_choice={"type": "function", "function": {"name": "_run_query"}},
                parallel_tool_calls= False
            )
# There will be 73 tool calls, and they will have the exact same argument
print(
    len(response_message.tool_calls)
)

# [ChatCompletionMessageToolCall(id='call_id1, function=Function(arguments='{"SELECT COUNT(1) FROM SALES WHERE PRODUCT = "apple"}', name='_run_query'), type='function'),
#ChatCompletionMessageToolCall(id='call_id2, function=Function(arguments='{"SELECT COUNT(1) FROM SALES WHERE PRODUCT = "apple"}', name='_run_query'), type='function'),
# ...
# ChatCompletionMessageToolCall(id='call_id73, function=Function(arguments='{"SELECT COUNT(1) FROM SALES WHERE PRODUCT = "apple"}', name='_run_query'), type='function')]
print(
    response_message.tool_calls
)

merefield · April 1, 2025, 10:56pm

Have you simplified this?

How does the tool/LLM learn of the structure of the DB schema?

Brecht.corbeel · April 1, 2025, 11:08pm

Do you mean the output creates more outputs based on questions that create calls to a dB? Your phrasing is weird or I am lacking context, but if you are doing too many calls something is off in your code. Maybe your dB or whatever contains too many similarities, improper storage with multiples being called because of the code? What does the actual call look like?

hannamtrey · April 1, 2025, 11:42pm

The prompt is 113 lines. It contains some instructions on SQL formatting as well as the DDL with descriptions of the columns.

Additionally, I have been testing other questions and it handles them correctly.

I added a sample to illustrate the issue I have.

Interestingly, it is creating a valid SQL query that is most likely close or correct to the result I am expecting but it is asking for it too many times

hannamtrey · April 1, 2025, 11:45pm

I added a sample of what it looks like, this should clear things up. Looking forward to your thoughts!

Brecht.corbeel · April 2, 2025, 6:01am

Ok, so what I am getting out of this is that you send the prompt 73 times and you don’t know why?

If there is no or not much other code calling this it might be a bug, or you have code somewhere that is retrying a bunch of times and it does it too much. It might be a built in function not working properly.

Have you tried running your script in a different environment with different versions of python?

You have no issues ever with your internet connection?

hannamtrey · April 2, 2025, 3:34pm

What is happening is I make one call. Then in the response it made 73 tool call requests.

In the docs for function calling the Function Calling Steps section shows that in a response you can parse the function arguments by..

# Step 3
tool_call = response.output[0]
args = json.loads(tool_call.arguments)

result = get_weather(args["latitude"], args["longitude"])

In the sample code it is showing that there was only one tool call. However, I get 73 and almost all of them are the same thing.

Brecht.corbeel · April 2, 2025, 4:11pm

Ahh I see what is going on so. I am guessing is it that your instructions are not clear or too difficult for 4o-mini did you train this model for such specific output? maybe you should train it to have different id’s with different outputs and you trained it on a copy paste or whatever?

hannamtrey · April 2, 2025, 6:28pm

Yeah I think it is something to do with my prompting.

I was trying few shot prompting and when I removed the examples it began behaving as expected and produced one tool call. I have continued testing and that issue no longer happens.

Additionally I added “You must only call _run_query once per user question.” to the prompt
and change the tool description to “Run SQL in database. Will return a list of dictionaries where each element represents a row. Can only be called once per user question”

This has solved it.

Still such an odd behavior to generate so many tool calls, especially since they have the same argument.

Topic		Replies	Views
Gpt-3.5-turbo-1106 model consistently responds with unnecessary and inappropriate function calls [confirmed BUG JAN 26] Bugs api , tools	9	2482	April 4, 2024
2 JSON objects returned when using function calling and JSON mode Bugs gpt-4 , gpt-35-turbo , api	5	2850	December 30, 2023
My most important function is being called only very rarely API gpt-35-turbo , prompt , functions	7	2419	December 19, 2023
Fixing tool-happy function call over-use on AI on latest models - technique and investigation API api , tools	3	1573	February 15, 2024
How to get API Tool Call to choose just one tool accurately API	6	4146	January 31, 2024

Ridiculous Number of Redundant Tool Calls

Related topics