Function calling looping uncontrollably and calling unnecessarily

srpetros · September 5, 2024, 8:20pm

Trying out a simple prompt (ours is more complex, but this is the bare bones and replicates the bug behavior):

You are a helpful assistant.

Respond with a json object containing the following keys:
- message: the message to the client
- done: a boolean set to true only if the user confirms they have no further questions

I am able to get completions shaped as requested and everything works as expected. If I add a get_weather tool to either model, the tool is ALWAYS called. Literally not even asking about weather, the weather tool is called. If I respond from the tool that weather is unavailable, it will simply call the tool again.

My expectation would be that the model should respond with an answer to the query rather than requesting weather data when there is nothing in the message history requesting weather data.

Am I just doing something wrong here? This is the BAREBONES set up in playground and I am not able to get it to behave as expected.

Thoughts?

nicholishen · September 5, 2024, 9:58pm

You must have selected both the function and a response format of type json_object. Change it to text and you shouldn’t have that issue.

srpetros · September 5, 2024, 10:14pm

are you saying it’s not possible to have tools in a completion request and expect the output as a json value? It currently works this way in our production codebase, however there are times where there’s an infinite looping of function calling.

srpetros · September 5, 2024, 10:20pm

update: i just set the response type to text, kept everything else the same…and…low and behold, it works.

So…should I not set the response type to json if I want a json response? It still seemed to generate the json as described in the prompt despite text being set.

Do you know how this is affected with structured outputs? Suppose I turn my prompt schema into a json_schema, will the problem with function calling persist?

Sorry if I’m asking a lot of questions, but I’m days into this and covering my bases.

nicholishen · September 5, 2024, 10:24pm

If you’re testing structured outputs in the playground then you’ll need to provide an actual schema for it by selecting json_schema as the format response, then inputting the schema in the window that pops up.

Introducing Structured Outputs in the API | OpenAI

You can check the API docs for more info on the different response format types.

srpetros · September 5, 2024, 10:58pm

yep, doing that now, but having an issue with it when it does call the tool, responding to the tool seems to cause an unknown error:

An error occurred. Either the engine you requested does not exist or there was another issue processing your request. If this issue persists please contact us through our help center at https://help.openai.com.

any thoughts on that?

srpetros · September 9, 2024, 2:52am

also, i’m noticing that despite asking for a JSON response, i’m getting text back from the model on almost every response. This was the reason why we set response format to JSON. Why does the combination of “json_output” with JSON format in the prompt yield endless looping over functions?

shaun7523 · September 12, 2024, 11:40am

Have the exact same issue. Lot of my apps are designed this way and it’s suddenly stopped working. Switching to structured outputs will taken time as I’ve specified most of these json schemas as plain text structures. It used to work perfectly till like 2 weeks ago.

jochenschultz · September 12, 2024, 11:48am

deactivate memory and change browser or empty cache

JCzic · September 12, 2024, 6:36pm

No, this is a real problem that arose with the introduction of structured output and the “json_schema” response format.
Using “json_object” wasn’t a problem until recently, even on GPT-4o, but now using this type of JSON output with function calls is totally incompatible.
We’ve had overconsumption as a result of parallel loop calls (thousands of dollars), and even by circumventing this, GPT-4o models do anything and can successively call the same function several times with the same parameters, knowing that the model is getting the result right.
The model’s response if you go further and question it by stopping its ability to call the function: It needs to check and check again whether the function gives a consistent result.
I reported this bug to OpenAI 2 days ago.

srpetros · September 12, 2024, 8:57pm

Have you found a solution to this? I have noticed from a previous poster’s suggestion that changing back to text for response type has unblocked us and I haven’t seen the issues with tool calls looping. The model does sometimes return simple text when I am asking for JSON, so that is less reliable. It might be tunable with prompt engineering…

We are still experimenting for now as our engineering cost to move to structured outputs is similarly high. Our occurrence rate for these infinite loops is fairly low and has mostly occurred in local testing, We hope to get ahead of this and find a solution before it becomes more widespread.

JCzic · September 13, 2024, 4:31am

Yes, in fact, switching back to text is the only valid solution.
The problem can be reproduced in a minute from the OpenAI Playground.
All you have to do is select the ‘json_object’ format and add a function (the weather forecast, for example, is already ready).
In this way, the GPT4o family simply calls up the function.
What’s most disturbing is when you switch back to text format during the conversation. The model can reply again and doesn’t understand why it felt compelled to do this, while explaining that it must be a bug.
So I switched back to text on my production platform…

nicholishen · September 13, 2024, 12:39pm

You would never want to use tool calling and JSON mode together in the same inference, and the fact that playground allows this is a bug in playground. Here is when to use each:

Json mode: when you don’t know (or care about) the structure of your data, but you do know you need it back as a json object. Example: you pass in a blob of text headers and you’re using the LLM to give you back a headers dict that you could use for http requests.

Tool calling: you do know the structure AND you’re legitimately using the LLM to call tools OR you need more flexible extraction than structured outputs can provide (datetime, defaults, etc).

Structured outputs: You do know the structure and you have no need for further LLM generation from the direct response to it (tool calling).

Now, there are some creative combinations that can occur such as: LLM tool call → tool response → structured output response. But you would still never supply the model with a JSON schema and then force it into json mode.

TLDR: Json mode when structure is undefined. Tool calling or structured outputs when it is. Never used together.

_j · September 13, 2024, 3:01pm

Preface: Use gpt-4o-2024-08-06 specifically for response mode with json_schema (structured outputs)

Functions + json_schema is giving errors, against expectation, with chat completions playground code.

Documentation: Structured Outputs is available in two forms in the OpenAI API:

When using function calling

When using a json_schema response format

Documentation doesn’t spell out these are exclusive separate uses.

I made a function requiring two arrays sent about the same entity extraction items, where the AI must call a function to find out the class of item instead of making its own determination (is “tomato” a vegetable or fruit according to us).

The AI somewhat unexpectedly employs parallel tool calls to split the entities it asks about. I type up a response for each function call because I don’t actually have answering code.

(playground preset at this point)

text: responds
json_object: responds (after placing a “json” in prompt)
json_schema: gives a UI error “NetworkError when attempting to fetch resource.”

Actual network error 500:
{
“error”: {
“message”: “The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our help center at help.openai.com if you keep seeing this error. (Please include the request ID req_49e9d8106e89eb6bbebbe24c9aaf215d in your email.)”,
“type”: “server_error”,
“param”: null,
“code”: null
}
}

Since sending to the API is not blocked, and a reason not returned, I assume there’s a structure validation conflict - like the response mode turning on functions strictness, and the json output not being parsable by the tool recipient determiner (or the opposite).

Also annoying in playground: there’s a switch to go between json and plain text, but it seems to add and remove whitespace that I prohibit, making what was actually received unclear.

nicholishen · September 13, 2024, 4:03pm

I’m not defending the state of the playground, but you are demonstrating an issue with the playground itself and not the API. Your scenario works in a notebook with the API. I would suggest prototyping stuff like this in your own notebook instead of playground.

srpetros · September 13, 2024, 4:47pm

You seem extremely knowledgeable on when and how to use the various config options in the completions api and have given advice that I don’t think I’ve seen anywhere in the API docs. Do you have a resource you’d recommend?

Our basic use case is completions endpoint as a chatbot wrapped by a thin API layer to achieve some data fetching/context augmentation in the course of a visitor conversation via tool calling. Our current structure is we pass along the available tools to the completions endpoint with every request as well as the transcript history, requesting a completion, responding to any intermediate tool calls where applicable, and keeping “running context” like visitor info etc in the completions response (hence json output).

Not asking you to solve our use case, but if you have any thoughts or a good resource to understand how best to architect this that you could share, that would be swell.

nicholishen · September 13, 2024, 5:10pm

Can you help me understand the “running context”? Are you trying to store a conversation rollup in a JSON object in the context window? Are you using the LLM to do this? Are you expecting the LLM to call a tool AND provide this running context in the same API call? Are you able to give me a better idea on the sequence and workflow?

srpetros · September 13, 2024, 5:33pm

So essentially we might keep in the json response a running summary of key data from the transcript, such as a visitor’s contact info, an order number, whether the conversation can/should be marked as completed, something like that. The expectation is more or less realtime (or fairly quick roundtrips) so wanting to limit loops to the LLM.

A conversation might look like this:

User: hey, can you tell me where my package is?

Assistant (JSON): { message: 'Sure, what is your tracking number and name?', name: '', tracking: '', done: false }

User: yeah, it’s Mike, and tracking is 12345

Assistant (TOOL CALL): { tool_call: get_order_by_tracking, tracking: 12345 }

Tool (TOOL RESPONSE): shipped out on 3/3, arrival expected by 3/12

Assistant (JSON): { message: 'Your package was mailed on 3/3 and should arrive by 3/12. Feel free to reach out again if you haven't received your package by then. Can I help you with something else?', name: 'Mike', tracking: '12345', done: false }

User: no, that’s it, thanks

Assistant: { message: 'It was a pleasure helping you today!', name: 'Mike', tracking: '12345', done: true }

The done basically tells us we can close the thread out and run analytics against it, the other stuff is used by the intermediate message processing functions.

This is a contrived example, but a glimpse into our usage.

nicholishen · September 13, 2024, 6:11pm

This is probably best answered with a code snippet. Please let me know if you have any questions.

import pydantic


class PrivateState(pydantic.BaseModel):
    """This keeps track of the current (private) state of the conversation for the backend."""

    is_customer_query_resolved: bool = pydantic.Field(
        description="Only mark this field as True if the customer has explicitly indicated that "
        "their query has been resolved and there is nothing else that you can assist them with."
    )
    tracking_numbers: list[str] | None = pydantic.Field(
        description="The tracking numbers IF any are indicated in the chat."
    )


class ResponseFramework(pydantic.BaseModel):
    """Use this to reply to customers and manage the conversation"""

    response_to_customer: str = pydantic.Field(
        description="When answering queries, always ask if there are other "
        "ways to help within the context of the conversation."
    )
    private_state: PrivateState


messages = [
    {
        "role":"system",
        "content": "You are a package tracking assistant. Only consider the user's query resolved after "
        "asking them if there's anything else you can assist them with and they explicitly reply "
        "indicating that there is nothing else you can help them with."
        },
    {
        "role": "user",
        "content": "hey, can you tell me where my package is? My tracking number is H1234",
    },
    {
        "role": "assistant",
        "tool_calls": [
            {
                "id": "1",
                "type": "function",
                "function": {
                    "name": "get_order_by_tracking",
                    "arguments": json.dumps({"tracking": "H1234"}),
                },
            }
        ],
    },
    {
        "role": "tool",
        "tool_call_id": "1",
        "content": json.dumps(
            {
                "shipped_on": "2024-01-01",
                "expected_delivery": "2024-01-03",
            }
        ),
    },
]

r = client.beta.chat.completions.parse(
    model="gpt-4o-mini", messages=messages, response_format=ResponseFramework
)
p = r.choices[0].message.parsed
print(json.dumps(p.model_dump(), indent=2))


# {
#   "response_to_customer": "Your package with tracking number H1234 was shipped on January 1, 2024, and is expected to be delivered by January 3, 2024. Is there anything else I can assist you with?",
#   "private_state": {
#     "is_customer_query_resolved": false,
#     "tracking_numbers": [
#       "H1234"
#     ]
#   }
# }

_j · September 13, 2024, 8:23pm

Yes, that why I frame the problem in my first sentence as within playground. Whack “get code”.

But using the share, and switching up the output, you can see that the disclaimed methods of “solution” actually work fine, and it is the AI model quality and input context that is creating more responses that start with a function call token rather than deciding to write to the user.

Topic		Replies	Views
Structured Outputs not reliable with GPT-4o-mini and GPT-4o API	28	1913	September 3, 2024
Model tries to call unknown function multi_tool_use.parallel Bugs	46	8667	August 28, 2024
Bad results when using fine-tuned model with function calling API fine-tuning , function-calling , fine-tuning-problems	15	4307	November 23, 2023
Function Calling very unreliable Prompting gpt-4 , chatgpt , plugin-development , api	32	29097	December 13, 2023
Function calling not returning the expected response structure API api , functions	9	6148	December 17, 2023

Function calling looping uncontrollably and calling unnecessarily

Related Topics