The gpt-3.5-turbo-1106
model often responds with function calls when they are not needed, populating the arguments with strange (and potentially biased) values.
I noticed this issue when developing a drug price retrieval application. I defined the following function:
{
"name": "getDrugs",
"description": "Get drug NDCs and other information",
"parameters": {
"type": "object",
"properties": {
"drugName": {
"type": "string"
},
"treats": {
"type": "string"
}
}
}
}
To test the code, I called the API with the user
message âWhat is the capital of France?â and I was surprised when GPT responded with two function calls:
[
{
"id": "call_MvVSLlUgc9XrQX3XEa8i2lSE",
"type": "function",
"function": {
"name": "getDrugs",
"arguments": "{\\"treats\\": \\"hypertension\\"}"
}
},
{
"id": "call_gzXWhChMYiu8tddBdBYdzLpw",
"type": "function",
"function": {
"name": "getDrugs",
"arguments": "{\\"drugName\\": \\"aspirin\\"}"
}
}
]
Of course, GPT should not need any functions to answer this question, especially not a drug information retrieval function. Things get even more bizarre when you try different countries, with different types of behavior occurring more often for certain countries:
- For some countries (e.g. England, Spain, and India), GPT generally responds with the correct response with no function calls.
- For some countries, GPT generally populates the function arguments in a strange way. e.g. for Peru, the
treats
argument was populated withCapital of Peru
orPeru
. Bizarre! - For some countries, GPT sometimes responds with arguments that are seem specific to that country: e.g. for African countries, the âtreatsâ argument is sometimes populated with âMalariaâ and âHIVâ whereas I have never seen this for European countries. Of the limited examples Iâve explored, this seems to happen most often with South Africa.
These behaviors occur for a range of temperature values. And while the choice of country seems to influence the behavior, all queries seem to exhibit varied behavior. With France, sometimes a correct response is returned or the arguments are bizarrely populated in a way similar to Peru. Also interestingly, when erroneous function calls are made it seems to almost always make two function calls. Iâve noticed this bizarre behavior before with completely different functions.
All of this occurred with no system prompt. The use of an appropriate system prompt (e.g. âOnly respond with functions when requiredâ) reduces this behavior, but I donât think this should be necessary. Besides, this behavior does not seem to occur with gpt-3.5-turbo-0613
If youâd like to try this out for yourself Iâve made an Observable notebook showing the problem (which I am unable to link here, but the notebook address is @siliconjazz/current-problem-with-gpt-3-5-turbo-1106-function-calling
) , or you can paste the function defined above into the playground.
EDIT: The issue seems to occur more often when there is no system prompt. Even a basic âYou are a helpful assistantâ prompt reduces the frequency of that behavior. But doesnât remove it completely. See _jâs response (unable to link directly due to 422 error)