Ridiculous Number of Redundant Tool Calls

I have been working on a text to SQL prompt and the LLM made 73 tool calls in one request and 70 were the EXCACT same call. The LLM should only need one call per user question. I am curious if there are any ideas of what is going on… my best guess is something in the prompt confusing the LLM

Here is the process flow:

  1. Take a user question and create an SQL statement
  2. Use a tool call to run the SQL in the DB
  • Tool definition
tools = [
            {
                "type": "function",
                "function": {
                    "name": "_run_query",
                    "description": "Run SQL in database. Will return a list of dictionaries where each element represents a row.",
                    "strict": True,
                    "parameters": {
                        "type": "object",
                        "properties": {
                        "sql_query": {
                            "type": "string",
                            "description": "syntactically correct SQL statement"
                        }
                        },
                        "additionalProperties": False,
                        "required": ["sql_query"]
                    }
                }
            }
        ]
  • Model Call
client.chat.completions.create(
                model="gpt-4o-mini",
                messages=messages,
                temperature=0,
                seed=20,
                tools=tools,
                tool_choice={"type": "function", "function": {"name": "_run_query"}},
                parallel_tool_calls= False
            )
  1. Turn the result into a natural language answer

Here’s an example:

user_question = "How many apples have been sold?"


response = client.chat.completions.create(
                model="gpt-4o-mini",
                messages=[
                {"role": "system", "content" :"You are a SQL expert..."}, # the actual prompt is about 100 lines
                {"role": "user", "content" :user_question}
],
                temperature=0,
                seed=20,
                tools=tools,
                tool_choice={"type": "function", "function": {"name": "_run_query"}},
                parallel_tool_calls= False
            )
# There will be 73 tool calls, and they will have the exact same argument
print(
    len(response_message.tool_calls)
)

# [ChatCompletionMessageToolCall(id='call_id1, function=Function(arguments='{"SELECT COUNT(1) FROM SALES WHERE PRODUCT = "apple"}', name='_run_query'), type='function'),
#ChatCompletionMessageToolCall(id='call_id2, function=Function(arguments='{"SELECT COUNT(1) FROM SALES WHERE PRODUCT = "apple"}', name='_run_query'), type='function'),
# ...
# ChatCompletionMessageToolCall(id='call_id73, function=Function(arguments='{"SELECT COUNT(1) FROM SALES WHERE PRODUCT = "apple"}', name='_run_query'), type='function')]
print(
    response_message.tool_calls
)

Have you simplified this?

How does the tool/LLM learn of the structure of the DB schema?

Do you mean the output creates more outputs based on questions that create calls to a dB? Your phrasing is weird or I am lacking context, but if you are doing too many calls something is off in your code. Maybe your dB or whatever contains too many similarities, improper storage with multiples being called because of the code? What does the actual call look like?

The prompt is 113 lines. It contains some instructions on SQL formatting as well as the DDL with descriptions of the columns.

Additionally, I have been testing other questions and it handles them correctly.

I added a sample to illustrate the issue I have.

Interestingly, it is creating a valid SQL query that is most likely close or correct to the result I am expecting but it is asking for it too many times

1 Like

I added a sample of what it looks like, this should clear things up. Looking forward to your thoughts!

Ok, so what I am getting out of this is that you send the prompt 73 times and you don’t know why?

If there is no or not much other code calling this it might be a bug, or you have code somewhere that is retrying a bunch of times and it does it too much. It might be a built in function not working properly.

Have you tried running your script in a different environment with different versions of python?

You have no issues ever with your internet connection?

What is happening is I make one call. Then in the response it made 73 tool call requests.

In the docs for function calling the Function Calling Steps section shows that in a response you can parse the function arguments by..

# Step 3
tool_call = response.output[0]
args = json.loads(tool_call.arguments)

result = get_weather(args["latitude"], args["longitude"])

In the sample code it is showing that there was only one tool call. However, I get 73 and almost all of them are the same thing.

Ahh I see what is going on so. I am guessing is it that your instructions are not clear or too difficult for 4o-mini did you train this model for such specific output? maybe you should train it to have different id’s with different outputs and you trained it on a copy paste or whatever?

Yeah I think it is something to do with my prompting.

I was trying few shot prompting and when I removed the examples it began behaving as expected and produced one tool call. I have continued testing and that issue no longer happens.

Additionally I added “You must only call _run_query once per user question.” to the prompt
and change the tool description to “Run SQL in database. Will return a list of dictionaries where each element represents a row. Can only be called once per user question

This has solved it.

Still such an odd behavior to generate so many tool calls, especially since they have the same argument.