I am writing a scientific paper, and I am using function calling.
I have done a test where a parameter is missing, and gpt-3.5-turbo-1106 keeps calling the function with missing parameters, adding as empty parameter, which I said not to do.
However, gpt-4-1106-preview does quite well what should be done: ask back the user the missing parameter.
Is there any study that shows the superiority of gpt-4-1106-preview over gpt-3.5-turbo-1106 in function calling?
It’s a much bigger model, and therefore holds much more information and understands more context about what you’re trying to do, so I’d say yes, GPT4 is superior in function calling.
i think comparing gpt-3.5 to gpt-4 performance should be indication enough - asking for a study comparing a specific feature of two specific versions that were released less than 6 weeks ago is rather specific, and I imagine it would have turned up on google scholar if it was the case.