Function Calling very unreliable

Interesting suggestion, @PriNova - I’ll look to modify my code to capture any “content” that occurs earlier in the function chaining. I’m actually becoming impressed with the function_call ability - I now have 4 functions chained - getCurrentUTCDateTime (live), getDogsName (silly), getNews (live), and getWeather (live). In my project, I now have a prompt which is answered successfully:

"What is my dog's name, tell me what time is it in PST, what
 is the weather like in London, and what sightseeing activities 
would you recommend for London this time of year?  Also please 
give me 5 articles on the US Economy from the last week.

The response is:

Your dog’s name is Rover.

The current time in PST (Pacific Standard Time) is 6:49 PM.

The weather in London is partly cloudy with a temperature of 18.0\u00b0C (64.4\u00b0F).

Here are 5 articles on the US Economy from the last week:

  1. Title: “On-Orbit Satellite Servicing, New Crew Capsules and Artificial Gravity: NASA’s Latest Tech Initiative”
    Description: “A new Blue Origin crewed spacecraft is in the works as part of a NASA collaboration designed to advance the orbital economy, with the space agency lending its expertise to seven different commercial partners.”

  2. Title: “Federal Reserve officials announce pause in US interest-rate hikes”
    Description: “Even with the pause, Fed officials suggest further increases may come depending on how close the economy gets to the 2% inflation target. US Federal Reserve officials have announced a pause in interest-rate hikes, leaving rates at 5% to 5.25% after more than a year of\u2026”

  3. Title: “Forget de-dollarization - Argentina may adopt the greenback as its currency and abandon the peso”
    Description: “If Argentina adopts the greenback, it would become the biggest economy so far to dollarize, according to Bloomberg.”

  4. Title: “The US economy is entering expansion, not recession, and investors fear they are missing out, Fundstrat’s Tom Lee says”
    Description: “The US is entering a phase of economic expansion and not a recession, which means the stock-market rally will become more broad-based, Fundstrat’s Tom Lee has said.”

  5. Title: “China\u2019s economy is way more screwed than anyone thought”
    Description: “Wall Street’s dream of a big Chinese boom, post-COVID reopening, has officially gone bust.”

For sightseeing activities in London this time of year, some recommendations would be visiting the Tower of London, taking a boat tour on the River Thames, exploring the British Museum, walking in Hyde Park, and visiting the Buckingham Palace."

4 Likes

If anyone is interested, I’ve also added Pinecone integration as a function call!

EDIT: OK, now I added sendEmail sendgrid functionality. This is becoming a beast! ChatGPT actually sent an email on my behalf.

5 Likes

One thing I’ve noticed is that if the user message content contains a JSON object, sometimes the model will incorporate bits of that object into the returned function call - sort of a mashup of the argument schema with the implied schema from the JSON object in the user message content.

In my case, the user message content contained a JSON encoded dictionary at the end of the user prompt. Pre-0613, I found that this improved the reliability of processing – somehow having it in that structured format really helped when you were trying to get the model to behave a little more predictably. Now though, I’m having to rework a few of these prompts if I want to get them to work properly with function calls. It’s not really a problem, it just means I have to use something other than JSON (changing that one thing does wonders)

Also observed here, I saw a workaround where someone tricked it by defining a “multifunction” that takes multiple functions are arguments.

Eg: search_for_user_and_email_them

It is quite annoying and costly, I hope it is fixed, prior to functions I simulated stuff using !command as special token and GPT-4 was able to generate multiple commands in one go. The limitation appears to be arbitrary.

I don’t think this is a problem in practice at Discourse we simply render something like this to the user:

Seen this as well, but I would blame me… more than it :slight_smile:

A very “obvious in retrospect” issue is that the function result prompt MUST have the function args in it, otherwise the model has no idea what it called:

For example, if you find no results do not use this as body:

[]

Instead use

{
    "results" : [],
    "query": "the man on the moon"
}

The way the model is very unlikely to go into a look searching for “man on the moon” over and over.

Additionally, I find that adding a little bit of extra guidance in the system prompt helps a fair bit, eg:

1 Like

Qs: Is this for a plugin or for API development? If it’s for plugin, have you also updated your “description for model” in the json manifest?

Json manifest: Although if your plugin is published modifying the json manifest can cause it to be removed from the plugin store and not ideal. However, I know that field “description for model” can affect the output behavior (i.e. I would explain in the description for model section how the added function calling works so that chat-gpt is aligned and not creating a conflict with the code in your main program files), as a potential troubleshooting step.

For example, I am able to do multiple API calls from my plugin (albeit all from the same API), with no native function calling used in my app, so long as all the requests are made in a single prompt (see here: https://chat.openai.com/c/fd9e1299-3cae-4b58-bbc7-fb759af2a061)

Another potential tip, if you have a plugin, ask Chat-GPT with the plugin loaded how your plugin name works, where "name_for_model: "your plugin name" is the description in your json manifest, where it should respond by reading from your “description_for_model” field with a high level overview. And if you haven’t built out the latter description sufficiently, you can paste the main program files code into Chat-GPT ask it to summarize the code as a description for the model and then paste it in the above-mentioned description for model section.

I’ve found this can help with a sort of fine-tuning the output behavior, but realize there may be better ways to achieve that, since the json manifest is not something you can change easily without having the plugin removed from the store if it is already live (but obviously can be experimented with while in development mode). Hope that helps! Cheers

1 Like

@t.haferlach did you or anyone else ever figure out how to make it dynamically be able to do more than one call?

I have had no trouble getting GPT to request multiple function calls in the same message, as long as I specifically tell it to do so within my system message prompt.

I’ll be preparing a repository with it shortly, showing what we do in our products. So far, I’ve managed to get 11 unique and useful function calls in the same message. The only time I’ve gotten more calls, is when it’s stuck in a loop.

I’ve also found appending the response verbatim to be very expensive on tokens, so recommend using a continual summarizer.

2 Likes

I was having the same issue but turned out to be because I had an errand user message in between function messages

I would worry that showing the model anything other then the JSON it returned would lead to less reliable function calls. These models are pattern recognizers at the end of the day so showing them examples of the output you expect reinforces their output. If you change the output pattern your mileage will vary.

Curious what the instruction your giving the prompt and how you’re getting multiple function calls back? The API only allows for a single function call so I’m assuming you’re having it call some multi_function_call function

Thanks, @stevenic - I guess that could cause it too - if you look at the posts above in the thread, we solved the infinite loop issue by passing “messages” of the chat history to the chatcompletion.

2 Likes

I also find that in some cases you get better performance without function calling. For example:

f'pd.dataframe sample of a larger dataset:---{rows}---. Is there at least one column containing either a scientific name string such as a latin name (can be for phylum, family, genus, species, subspecies, etc), or a scientificNameID such as a BOLD ID or an LSID? '

If I append ‘Reply with “yes” or “no”, and no other text.’ to this prompt and send it, and interpret the result something like if 'no' in response['choices'][0]['message'] I always seem to get the right response.

Whereas if I add function calling like this:

from pydantic import BaseModel, Field
class DisplayScientificNameFeedbackMessage(BaseModel):
    scientific_name_missing: bool = Field(...)
functions = [{'name': DisplayScientificNameFeedbackMessage.__name__, 'parameters': DisplayScientificNameFeedbackMessage.schema()}]
function_call = {'name': functions[0]['name']}
response = openai.ChatCompletion.create(model='gpt-3.5-turbo-16k', messages=serialized_messages, functions=functions, function_call=function_call)
response_message = response['choices'][0]['message']

Approx 1 in 5 times the results of json.loads(response_message[‘function_call’][‘arguments’])[‘scientific_name_missing’] will be incorrect - False when it should be True, or vice versa. Perhaps I’m misunderstanding the way prompts should be used?

I utilize the “confidence score” parameter specified within the function call JSON. In the description, I mention that it should return a confidence score ranging from 0 to 1, with 1 indicating the utmost confidence and 0 suggesting no confidence in its responses. This approach significantly enhanced the stability of my pipeline.

1 Like