Function Calling very unreliable

t.haferlach · June 17, 2023, 9:44am

I just migrated our chatbot which used system prompts to define functions and was parsing the responses by hand over to use the new function calling.

It seems to be worse at getting the syntax of inputs right than before and goes into failure modes where it passes the wrong parameters to the function quite regularly. After that, it does not recover.

Going to check if I’m making some mistakes in the prompting.

Now my system is actually less flexible than before because for example

it can only do one call per response
there is no documented way to respond to starting of a long-running function
goes into irrecoverable feedback loops where it remembers its own errors very easily

I also tried with the new GPT-4 model. Same results. And GPT-4 was quite reliable with my previous “manual” method.

Any ideas? Tips?

_j · June 17, 2023, 9:57am

Consider that the new model might have been trained on how to recognize a function it can call, but it has no training on YOUR function.

I would start by giving it a system prompt describing the purpose of the chatbot and functions it can call more than the function itself, and showing it multi-shot turns of both responding directly and calling the function.

And then a different prompt for how to present function information.

If the output takes a while to generate from your function call, you can always just throw up your own “processing” message in code.

t.haferlach · June 17, 2023, 10:16am

Consider that the new model might have been trained on how to recognize a function it can call, but it has no training on YOUR function.

But isn’t that what function calling is supposed to solve? I’m explaining my functions to GPT using their schema.

I was thinking about adding example function calls to the system prompt. How would I format the example function calls ?

From previous experience with a chatbot running in our company internally even with a lot of examples, it can start hallucinating incorrect syntax and semantics a lot.

t.haferlach · June 17, 2023, 11:36am

I’m trying an approach:

Generate multiple (e.g. 5) completions.
If there is at least one that contains a function call: 
  if the function exists and provided parameters match with the schema 
    Iterate over function calls:
       Execute the function call.
       if success 
         return result
  else
    if non-function call completion is present
      respond with text
    else
      failure to call function. communicate

I will also add a list of correct usage examples of the function calls to the beginning of the message history.

PriNova · June 17, 2023, 12:02pm

Did you pass your function call schema to the system prompt?
Did you use low temperature?
Are your function and parameter descriptions concise and coherent?

If all questions can be answered with “Yes”, then there must be other issues.

May you provide an example?

t.haferlach · June 17, 2023, 12:31pm

Low temperature: yes.

I’m using the new Function Calling API of OpenAI. Read about it here:

This is the new recommended way to describe functions to GPT. With the new models, the system prompt is supposed to only contain more general instructions that don’t relate to function calls directly as far as I understood.

I also tried putting the schema in both the system message and the functions message with no obvious improvement.

batjko · June 17, 2023, 1:13pm

Would be strange if that were necessary, given that the function description JSON object you pass in is added to the context anyway (that’s why it takes up tokens).

PriNova · June 17, 2023, 1:59pm

From others and my own experience, to pass the function object into the system message has, since fine-tuning of the new model for system attention, gaves the best results so far. Doesn’t matter, where you put it in a request in respect to tokens count.

PriNova · June 17, 2023, 2:00pm

The functions message is the new role to put the result of the function call in it. GPT will generate an appropriate answer from this result and the prompt from the user.

t.haferlach · June 17, 2023, 2:59pm

So you’re saying there’s a bug or lower performance when doing it the way OpenAI recommends? Their documentation and examples don’t use the system message for functions anymore.

github.com

openai/openai-cookbook/blob/main/examples/How_to_call_functions_with_chat_models.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "3e67f200",
   "metadata": {},
   "source": [
    "# How to call functions with chat models\n",
    "\n",
    "This notebook covers how to use the Chat Completions API in combination with external functions to extend the capabilities of GPT models.\n",
    "\n",
    "`functions` is an optional parameter in the Chat Completion API which can be used to provide function specifications. The purpose of this is to enable models to generate function arguments which adhere to the provided specifications. Note that the API will not actually execute any function calls. It is up to developers to execute function calls using model outputs.\n",
    "\n",
    "If the `functions` parameter is provided then by default the model will decide when it is appropriate to use one of the functions. The API can be forced to use a specific function by setting the `function_call` parameter to `{\"name\": \"<insert-function-name>\"}`. The API can also be forced to not use any function by setting the `function_call` parameter to `\"none\"`. If a function is used, the output will contain `\"finish_reason\": \"function_call\"` in the response, as well as a `function_call` object that has the name of the function and the generated function arguments.\n",
    "\n",
    "### Overview\n",
    "\n",
    "This notebook contains the following 2 sections:\n",
    "\n",
    "- **How to generate function arguments:** Specify a set of functions and use the API to generate function arguments.\n",

This file has been truncated. show original

PriNova · June 17, 2023, 3:05pm

The function calling is appended to the body of the request and injected into the systems message. It is important to describe what assistant needs to do in the system message. A simple “I’m a helpful and friendly assistant” is not enough.

But I would suggest to treat a function call as a one-shot request in the system message even if you handle a conversation.

akansel · June 17, 2023, 4:26pm

I’ve noticed that system messages can impact behavior erratically when they are appended subsequent to the user messages. I wonder if the order of appending the function message matters, too.

sfsean · June 17, 2023, 4:29pm

@t.haferlach - I’m going to follow this thread - I’m having problems with multiple calls - are you experiencing this? I will be pushing a simple opensource python wrapper on git soon that allows you to programatically create your functions and pass them to chatGPT so you don’t need to write json objects. Basically this:

# 'If you've used SQLCLient or OracleClient, this is similar.  
# 'Create your function, and then add parameters.
# 'Then you add your function to the "functions" dictionary 
# 'object (a dictionary is used to allow subsequent function lookup for security
# 'to make sure you're allowed to execute the function).
#' The to_json() function will turn the dictionary into a list for chatcompletion consumption
    f = function(name="getNews", description="News API function")
    f.properties.add(property("module",PropertyType.string, "Python Module to Call to run function", True, default="functions.newsapi_org"))
    f.properties.add(property("q",PropertyType.string, "Query to return news stories", True))
    f.properties.add(property("language",PropertyType.string, "Language of News", True, ["en", "es"], default="en"))
    f.properties.add(property("sortBy",PropertyType.string, "Sort By item", False))
    f.properties.add( property("pageSize",PropertyType.integer, "Page Size", True))
    chatFunctions[f.name] = f
    f = function(name="getCurrentDateTime", description="Obtain the current date time in GMT format")
    chatFunctions[f.name] = f
 prompt = "What time is it and find several current news stories on the US economy."
  res = oai.callwithfunction(prompt, chatFunctions.to_json())

This works well, however, the current issue I am having is that the request goes into an infinite loop (it requests the getCurrentDateTime, then requests the getNews successfully, but then requests getCurrentDateTime again, then getNews… - an infinite loop - rather than responding with a status of STOP (request completed).

Any thoughts would be appreciated!

t.haferlach · June 17, 2023, 4:46pm

I’m seeing the looping behavior happen (especially with the temperature set to low) quite a lot.

I tried to add some sentences to the system prompt to tell it not to go into these kinds of feedback loops. Maybe the “presence penalty” parameter could help.

But as I see it the problem is once it has generated a response to a function call followed by another function call it starts treating it as a pattern that is just continued blindly.

Still trying to figure this part out myself. One could manually tell it to not use a function call after one counted say two function calls.

As a workaround.

@akansel adding the system message at the end makes it completely forget to respond to the last user message regularly. I wouldn’t recommend it.

sfsean · June 17, 2023, 5:07pm

@t.haferlach - I thought I was onto something when reviewing the returned “content” - the first time I provide the function list, it seems to mirror back the list (my hope was that it knew which functions it wanted). When I ran the subsequent function, the “content” was null.

HOWEVER, I wanted to test that theory, so I created a dummy function “getFooBar” which should not be executed. Unfortunately, when I executed the call with functions appended (including a random one that should not be used), I receive the following content returned instead of a function call request:

 "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "To answer the user's question, the following functions are required:\n\n1. `getCurrentTime`: This function will provide the current time.\n\n2. `getNews`: This function will retrieve several current news stories on the US economy.\n\nPlease provide the necessary information to execute these functions."
      },
      "finish_reason": "stop"
    }

While correct, it didn’t return the “finish_reason” “function_call” that is required.

PriNova · June 17, 2023, 5:16pm

You can change the temperature when ever you want back to your preferences. Only for function calling a low temperature and for content creation a higher temperature.

sfsean · June 17, 2023, 7:29pm

For anyone else having confusion about executing multiple functions in a chat pipeline, I’ve pushed a repo to Git which will hopefully assist you. Hopefully it may demystify the confusion around functions. This sample executes 3 functions (assuming you have a newsapi api key) and combines the data from all 3 functions.

While it doesn’t appear that I can include a link on this forum, you can find it at the link below :

https:__github.com_seanconnolly2000_openai-functions-wrapper (replace the underscores with slashes).

Hope this helps!

t.haferlach · June 17, 2023, 11:40pm

Amazing. Checking. I’ve also realized I may have been making a mistake in that I forgot to include GPTs response in the message history that I pass to it.

sfsean · June 17, 2023, 11:50pm

@t.haferlach - While the results are better, I still agree with your assertion that results are unreliable.

When I use a “system” role preamble, I can confuse chatGPT. There are times when both part of an answer is stored in the “content” AND a function call is requested (I thought it was supposed to be EITHER content is present OR content is null and a function_call is present):

For the following prompt and 3 calls (getDogName, getCurrentUTCDateTime, getNews), on the second call, part of this request is returned (see response below):
Prompt:

prompt = "Tell me my dogs name, tell me what time is it in PST, 
and give me some news stories about the US Economy."

 "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Your dog's name is Spot. The current time in PST is 4:14 PM.",
        "function_call": {
          "name": "getNews",
          "arguments": "{\n  \"q\": \"US Economy\",\n  \"language\": \"en\",\n  \"pageSize\": 5,\n  \"sortBy\": \"publishedAt\"\n}"
        }
      },
      "finish_reason": "function_call"
    }

Notice the “content” AND the function_call request…

PriNova · June 18, 2023, 12:43am

I find this behavior good IMHO.
My use case was that GPT should give me a python function (as content) and then test this python function through a function call for evaluation.
GPT did exactly, what you are describing here. For my use case this is exactly what I wanted and it works. My function call had the following parameters: “functionName”, “args” and “body”.

So what I finally did was, that GPT should generate a python function and print it on screen and in the same time calling my function_call with the generated python function as a request for evaluation. I then evaluated the function for correctness and give back the result, that the function GPT gave me passes the tests.
This is some kind of in-context learning, which helps to improve his answers in the follow-up conversation. I think to implement some self-reflection with this and the tree-of-thought process too.

Topic		Replies	Views
Emulated multi-function calls within one request API	26	21880	December 17, 2023
Function calling not returning the expected response structure API api , functions	9	6932	December 17, 2023
Bad results when using fine-tuned model with function calling API fine-tuning , function-calling , fine-tuning-problems	15	4739	November 23, 2023
Few-shot and function calling API	24	13377	December 27, 2023
Fine-tuned model sometimes repeats itself verbatim Prompting	10	3856	November 6, 2023

Function Calling very unreliable

Related topics