Function Calling very unreliable

I just migrated our chatbot which used system prompts to define functions and was parsing the responses by hand over to use the new function calling.

It seems to be worse at getting the syntax of inputs right than before and goes into failure modes where it passes the wrong parameters to the function quite regularly. After that, it does not recover.

Going to check if I’m making some mistakes in the prompting.

Now my system is actually less flexible than before because for example

  • it can only do one call per response
  • there is no documented way to respond to starting of a long-running function
  • goes into irrecoverable feedback loops where it remembers its own errors very easily

I also tried with the new GPT-4 model. Same results. And GPT-4 was quite reliable with my previous “manual” method.

Any ideas? Tips?

5 Likes

Consider that the new model might have been trained on how to recognize a function it can call, but it has no training on YOUR function.

I would start by giving it a system prompt describing the purpose of the chatbot and functions it can call more than the function itself, and showing it multi-shot turns of both responding directly and calling the function.

And then a different prompt for how to present function information.

If the output takes a while to generate from your function call, you can always just throw up your own “processing” message in code.

2 Likes

Consider that the new model might have been trained on how to recognize a function it can call, but it has no training on YOUR function.

But isn’t that what function calling is supposed to solve? I’m explaining my functions to GPT using their schema.

I was thinking about adding example function calls to the system prompt. How would I format the example function calls ?

From previous experience with a chatbot running in our company internally even with a lot of examples, it can start hallucinating incorrect syntax and semantics a lot.

2 Likes

I’m trying an approach:

Generate multiple (e.g. 5) completions.
If there is at least one that contains a function call: 
  if the function exists and provided parameters match with the schema 
    Iterate over function calls:
       Execute the function call.
       if success 
         return result
  else
    if non-function call completion is present
      respond with text
    else
      failure to call function. communicate

I will also add a list of correct usage examples of the function calls to the beginning of the message history.

1 Like

Did you pass your function call schema to the system prompt?
Did you use low temperature?
Are your function and parameter descriptions concise and coherent?

If all questions can be answered with “Yes”, then there must be other issues.

May you provide an example?

1 Like

Low temperature: yes.

I’m using the new Function Calling API of OpenAI. Read about it here:

This is the new recommended way to describe functions to GPT. With the new models, the system prompt is supposed to only contain more general instructions that don’t relate to function calls directly as far as I understood.

I also tried putting the schema in both the system message and the functions message with no obvious improvement.

1 Like

Would be strange if that were necessary, given that the function description JSON object you pass in is added to the context anyway (that’s why it takes up tokens).

3 Likes

From others and my own experience, to pass the function object into the system message has, since fine-tuning of the new model for system attention, gaves the best results so far. Doesn’t matter, where you put it in a request in respect to tokens count.

1 Like

The functions message is the new role to put the result of the function call in it. GPT will generate an appropriate answer from this result and the prompt from the user.

1 Like

So you’re saying there’s a bug or lower performance when doing it the way OpenAI recommends? Their documentation and examples don’t use the system message for functions anymore.

1 Like

The function calling is appended to the body of the request and injected into the systems message. It is important to describe what assistant needs to do in the system message. A simple “I’m a helpful and friendly assistant” is not enough.

But I would suggest to treat a function call as a one-shot request in the system message even if you handle a conversation.

1 Like

I’ve noticed that system messages can impact behavior erratically when they are appended subsequent to the user messages. I wonder if the order of appending the function message matters, too.

1 Like

@t.haferlach - I’m going to follow this thread - I’m having problems with multiple calls - are you experiencing this? I will be pushing a simple opensource python wrapper on git soon that allows you to programatically create your functions and pass them to chatGPT so you don’t need to write json objects. Basically this:

# 'If you've used SQLCLient or OracleClient, this is similar.  
# 'Create your function, and then add parameters.
# 'Then you add your function to the "functions" dictionary 
# 'object (a dictionary is used to allow subsequent function lookup for security
# 'to make sure you're allowed to execute the function).
#' The to_json() function will turn the dictionary into a list for chatcompletion consumption
    f = function(name="getNews", description="News API function")
    f.properties.add(property("module",PropertyType.string, "Python Module to Call to run function", True, default="functions.newsapi_org"))
    f.properties.add(property("q",PropertyType.string, "Query to return news stories", True))
    f.properties.add(property("language",PropertyType.string, "Language of News", True, ["en", "es"], default="en"))
    f.properties.add(property("sortBy",PropertyType.string, "Sort By item", False))
    f.properties.add( property("pageSize",PropertyType.integer, "Page Size", True))
    chatFunctions[f.name] = f
    f = function(name="getCurrentDateTime", description="Obtain the current date time in GMT format")
    chatFunctions[f.name] = f
 prompt = "What time is it and find several current news stories on the US economy."
  res = oai.callwithfunction(prompt, chatFunctions.to_json())

This works well, however, the current issue I am having is that the request goes into an infinite loop (it requests the getCurrentDateTime, then requests the getNews successfully, but then requests getCurrentDateTime again, then getNews… - an infinite loop - rather than responding with a status of STOP (request completed).

Any thoughts would be appreciated!

1 Like

I’m seeing the looping behavior happen (especially with the temperature set to low) quite a lot.

I tried to add some sentences to the system prompt to tell it not to go into these kinds of feedback loops. Maybe the “presence penalty” parameter could help.

But as I see it the problem is once it has generated a response to a function call followed by another function call it starts treating it as a pattern that is just continued blindly.

Still trying to figure this part out myself. One could manually tell it to not use a function call after one counted say two function calls.

As a workaround.

@akansel adding the system message at the end makes it completely forget to respond to the last user message regularly. I wouldn’t recommend it.

2 Likes

@t.haferlach - I thought I was onto something when reviewing the returned “content” - the first time I provide the function list, it seems to mirror back the list (my hope was that it knew which functions it wanted). When I ran the subsequent function, the “content” was null.

HOWEVER, I wanted to test that theory, so I created a dummy function “getFooBar” which should not be executed. Unfortunately, when I executed the call with functions appended (including a random one that should not be used), I receive the following content returned instead of a function call request:

 "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "To answer the user's question, the following functions are required:\n\n1. `getCurrentTime`: This function will provide the current time.\n\n2. `getNews`: This function will retrieve several current news stories on the US economy.\n\nPlease provide the necessary information to execute these functions."
      },
      "finish_reason": "stop"
    }

While correct, it didn’t return the “finish_reason” “function_call” that is required. :frowning:

2 Likes

You can change the temperature when ever you want back to your preferences. Only for function calling a low temperature and for content creation a higher temperature.

2 Likes

For anyone else having confusion about executing multiple functions in a chat pipeline, I’ve pushed a repo to Git which will hopefully assist you. Hopefully it may demystify the confusion around functions. This sample executes 3 functions (assuming you have a newsapi api key) and combines the data from all 3 functions.

While it doesn’t appear that I can include a link on this forum, you can find it at the link below :

https:__github.com_seanconnolly2000_openai-functions-wrapper (replace the underscores with slashes).

Hope this helps!

4 Likes

Amazing. Checking. I’ve also realized I may have been making a mistake in that I forgot to include GPTs response in the message history that I pass to it.

2 Likes

@t.haferlach - While the results are better, I still agree with your assertion that results are unreliable.

When I use a “system” role preamble, I can confuse chatGPT. There are times when both part of an answer is stored in the “content” AND a function call is requested (I thought it was supposed to be EITHER content is present OR content is null and a function_call is present):

For the following prompt and 3 calls (getDogName, getCurrentUTCDateTime, getNews), on the second call, part of this request is returned (see response below):
Prompt:

prompt = "Tell me my dogs name, tell me what time is it in PST, 
and give me some news stories about the US Economy."
 "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Your dog's name is Spot. The current time in PST is 4:14 PM.",
        "function_call": {
          "name": "getNews",
          "arguments": "{\n  \"q\": \"US Economy\",\n  \"language\": \"en\",\n  \"pageSize\": 5,\n  \"sortBy\": \"publishedAt\"\n}"
        }
      },
      "finish_reason": "function_call"
    }

Notice the “content” AND the function_call request…

2 Likes

I find this behavior good IMHO.
My use case was that GPT should give me a python function (as content) and then test this python function through a function call for evaluation.
GPT did exactly, what you are describing here. For my use case this is exactly what I wanted and it works. My function call had the following parameters: “functionName”, “args” and “body”.

So what I finally did was, that GPT should generate a python function and print it on screen and in the same time calling my function_call with the generated python function as a request for evaluation. I then evaluated the function for correctness and give back the result, that the function GPT gave me passes the tests.
This is some kind of in-context learning, which helps to improve his answers in the follow-up conversation. I think to implement some self-reflection with this and the tree-of-thought process too.

1 Like