We have automated tests that check if we implemented our function calling properly. The tests look something like that, just for general information:
await testToolPrompt(['Hours on job 12345'], {
name: 'my_tool',
arguments: {
jobId: 12345,
isHoursRequested: true,
},
});
The test calls completion function hundreds of times to make sure that AI returns proper arguments (second argument is what we expect). With stats of success and fail we know if our function is not OK and requires additional changes.
I’ve played with temperature and surprising for me it makes significant effect on quality of function calling responses. I naively thought function calling isn’t just a text generation and should be precise with any temperature given. With temperature 0 we have all arguments and values we expect, but with temperature 2 it generates random arguments that make no sense at all. With temperature between 0 and 2 we have average but still low quality results.
We want to keep possibility to set temperature for general text generation, but we also expect our functions to generate precise arguments. Is there a workaround that allows us to keep using temperature but make it not fuck up our function calling? Thank you!