Function Calling very unreliable

sfsean · June 18, 2023, 2:53am

Interesting suggestion, @PriNova - I’ll look to modify my code to capture any “content” that occurs earlier in the function chaining. I’m actually becoming impressed with the function_call ability - I now have 4 functions chained - getCurrentUTCDateTime (live), getDogsName (silly), getNews (live), and getWeather (live). In my project, I now have a prompt which is answered successfully:

"What is my dog's name, tell me what time is it in PST, what
 is the weather like in London, and what sightseeing activities 
would you recommend for London this time of year?  Also please 
give me 5 articles on the US Economy from the last week.

The response is:

Your dog’s name is Rover.

The current time in PST (Pacific Standard Time) is 6:49 PM.

The weather in London is partly cloudy with a temperature of 18.0\u00b0C (64.4\u00b0F).

Here are 5 articles on the US Economy from the last week:

Title: “On-Orbit Satellite Servicing, New Crew Capsules and Artificial Gravity: NASA’s Latest Tech Initiative”
Description: “A new Blue Origin crewed spacecraft is in the works as part of a NASA collaboration designed to advance the orbital economy, with the space agency lending its expertise to seven different commercial partners.”
Title: “Federal Reserve officials announce pause in US interest-rate hikes”
Description: “Even with the pause, Fed officials suggest further increases may come depending on how close the economy gets to the 2% inflation target. US Federal Reserve officials have announced a pause in interest-rate hikes, leaving rates at 5% to 5.25% after more than a year of\u2026”
Title: “Forget de-dollarization - Argentina may adopt the greenback as its currency and abandon the peso”
Description: “If Argentina adopts the greenback, it would become the biggest economy so far to dollarize, according to Bloomberg.”
Title: “The US economy is entering expansion, not recession, and investors fear they are missing out, Fundstrat’s Tom Lee says”
Description: “The US is entering a phase of economic expansion and not a recession, which means the stock-market rally will become more broad-based, Fundstrat’s Tom Lee has said.”
Title: “China\u2019s economy is way more screwed than anyone thought”
Description: “Wall Street’s dream of a big Chinese boom, post-COVID reopening, has officially gone bust.”

For sightseeing activities in London this time of year, some recommendations would be visiting the Tower of London, taking a boat tour on the River Thames, exploring the British Museum, walking in Hyde Park, and visiting the Buckingham Palace."

sfsean · June 18, 2023, 3:42pm

If anyone is interested, I’ve also added Pinecone integration as a function call!

EDIT: OK, now I added sendEmail sendgrid functionality. This is becoming a beast! ChatGPT actually sent an email on my behalf.

fraser.iain · June 21, 2023, 1:36pm

One thing I’ve noticed is that if the user message content contains a JSON object, sometimes the model will incorporate bits of that object into the returned function call - sort of a mashup of the argument schema with the implied schema from the JSON object in the user message content.

In my case, the user message content contained a JSON encoded dictionary at the end of the user prompt. Pre-0613, I found that this improved the reliability of processing – somehow having it in that structured format really helped when you were trying to get the model to behave a little more predictably. Now though, I’m having to rework a few of these prompts if I want to get them to work properly with function calls. It’s not really a problem, it just means I have to use something other than JSON (changing that one thing does wonders)

sam.saffron · June 21, 2023, 11:29pm

Also observed here, I saw a workaround where someone tricked it by defining a “multifunction” that takes multiple functions are arguments.

Eg: search_for_user_and_email_them

It is quite annoying and costly, I hope it is fixed, prior to functions I simulated stuff using !command as special token and GPT-4 was able to generate multiple commands in one go. The limitation appears to be arbitrary.

I don’t think this is a problem in practice at Discourse we simply render something like this to the user:

Seen this as well, but I would blame me… more than it

A very “obvious in retrospect” issue is that the function result prompt MUST have the function args in it, otherwise the model has no idea what it called:

For example, if you find no results do not use this as body:

[]

Instead use

{
    "results" : [],
    "query": "the man on the moon"
}

The way the model is very unlikely to go into a look searching for “man on the moon” over and over.

Additionally, I find that adding a little bit of extra guidance in the system prompt helps a fair bit, eg:

github.com

discourse/discourse-ai/blob/main/lib/modules/ai_bot/commands/search_command.rb#L75-L81


      
          
          def custom_system_message
            <<~TEXT
              You were trained on OLD data, lean on search to get up to date information about this forum
              When searching try to SIMPLIFY search terms
              Discourse search joins all terms with AND. Reduce and simplify terms to find more results.
            TEXT

shatzakis · June 22, 2023, 3:39am

Qs: Is this for a plugin or for API development? If it’s for plugin, have you also updated your “description for model” in the json manifest?

Json manifest: Although if your plugin is published modifying the json manifest can cause it to be removed from the plugin store and not ideal. However, I know that field “description for model” can affect the output behavior (i.e. I would explain in the description for model section how the added function calling works so that chat-gpt is aligned and not creating a conflict with the code in your main program files), as a potential troubleshooting step.

For example, I am able to do multiple API calls from my plugin (albeit all from the same API), with no native function calling used in my app, so long as all the requests are made in a single prompt (see here: https://chat.openai.com/c/fd9e1299-3cae-4b58-bbc7-fb759af2a061)

Another potential tip, if you have a plugin, ask Chat-GPT with the plugin loaded how your plugin name works, where "name_for_model: "your plugin name" is the description in your json manifest, where it should respond by reading from your “description_for_model” field with a high level overview. And if you haven’t built out the latter description sufficiently, you can paste the main program files code into Chat-GPT ask it to summarize the code as a description for the model and then paste it in the above-mentioned description for model section.

I’ve found this can help with a sort of fine-tuning the output behavior, but realize there may be better ways to achieve that, since the json manifest is not something you can change easily without having the plugin removed from the store if it is already live (but obviously can be experimented with while in development mode). Hope that helps! Cheers

tejas1 · June 24, 2023, 2:15am

@t.haferlach did you or anyone else ever figure out how to make it dynamically be able to do more than one call?

damon.blais · June 24, 2023, 3:44am

I have had no trouble getting GPT to request multiple function calls in the same message, as long as I specifically tell it to do so within my system message prompt.

I’ll be preparing a repository with it shortly, showing what we do in our products. So far, I’ve managed to get 11 unique and useful function calls in the same message. The only time I’ve gotten more calls, is when it’s stuck in a loop.

I’ve also found appending the response verbatim to be very expensive on tokens, so recommend using a continual summarizer.

stevenic · July 1, 2023, 8:30am

I was having the same issue but turned out to be because I had an errand user message in between function messages

stevenic · July 1, 2023, 8:45am

I would worry that showing the model anything other then the JSON it returned would lead to less reliable function calls. These models are pattern recognizers at the end of the day so showing them examples of the output you expect reinforces their output. If you change the output pattern your mileage will vary.

Curious what the instruction your giving the prompt and how you’re getting multiple function calls back? The API only allows for a single function call so I’m assuming you’re having it call some multi_function_call function

sfsean · July 1, 2023, 3:08pm

Thanks, @stevenic - I guess that could cause it too - if you look at the posts above in the thread, we solved the infinite loop issue by passing “messages” of the chat history to the chatcompletion.

rukayaj · July 5, 2023, 8:45am

I also find that in some cases you get better performance without function calling. For example:

f'pd.dataframe sample of a larger dataset:---{rows}---. Is there at least one column containing either a scientific name string such as a latin name (can be for phylum, family, genus, species, subspecies, etc), or a scientificNameID such as a BOLD ID or an LSID? '

If I append ‘Reply with “yes” or “no”, and no other text.’ to this prompt and send it, and interpret the result something like if 'no' in response['choices'][0]['message'] I always seem to get the right response.

Whereas if I add function calling like this:

from pydantic import BaseModel, Field
class DisplayScientificNameFeedbackMessage(BaseModel):
    scientific_name_missing: bool = Field(...)
functions = [{'name': DisplayScientificNameFeedbackMessage.__name__, 'parameters': DisplayScientificNameFeedbackMessage.schema()}]
function_call = {'name': functions[0]['name']}
response = openai.ChatCompletion.create(model='gpt-3.5-turbo-16k', messages=serialized_messages, functions=functions, function_call=function_call)
response_message = response['choices'][0]['message']

Approx 1 in 5 times the results of json.loads(response_message[‘function_call’][‘arguments’])[‘scientific_name_missing’] will be incorrect - False when it should be True, or vice versa. Perhaps I’m misunderstanding the way prompts should be used?

hskapasi · August 11, 2023, 4:37am

I utilize the “confidence score” parameter specified within the function call JSON. In the description, I mention that it should return a confidence score ranging from 0 to 1, with 1 indicating the utmost confidence and 0 suggesting no confidence in its responses. This approach significantly enhanced the stability of my pipeline.

Topic		Replies	Views
Emulated multi-function calls within one request API	26	21808	December 17, 2023
Function calling not returning the expected response structure API api , functions	9	6888	December 17, 2023
Bad results when using fine-tuned model with function calling API fine-tuning , function-calling , fine-tuning-problems	15	4711	November 23, 2023
Few-shot and function calling API	24	13291	December 27, 2023
Fine-tuned model sometimes repeats itself verbatim Prompting	10	3758	November 6, 2023

Function Calling very unreliable

Related topics