The new Functions support that OpenAI rolled out last week sounds awesome and is super promising, but search the forums and you’ll find numerous examples of them simply not working as expected. That’s because functions only solve half the problem of getting structured responses from GPT.
Let’s start with the good… The best part about functions is that OpenAI is fine-tuning the model to return JSON more reliably and in all of my active projects I’m seeing more reliable JSON based responses across the board so that in itself is awesome!
The bad with functions, though, is that they’re not reliable and OpenAI isn’t really in the best position to fix that. Search the forums and you’ll find all sorts of examples of hallucinated functions, missing required parameters, and even malformed JSON. All things that these models are known to do. There’s zero magic at play here. You’re giving the model a list of the functions you support and they’re tacking some tokens onto your prompt to express this function list to the model but the model is still going to generate the same output it would have. You just have less control over the input.
So why do I say that OpenAI isn’t in a great position to fix the issue with functions? That’s because there is a way to make them reliable. You need to do two things: 1) you need to validated the returned function against a JSON Schema (OpenAI doesn’t do this) and 2) if you detect an error you need to tell the model to correct the issue and it will. The key with this second piece is that you can’t tell the model the mistake it made because there’s no guarantees it will correct it. These models have been tuned to follow instructions, so you need to tell the model the exact change you want it to make. If the function is missing a “foo” parameter don’t tell it “you forgot a parameter named ‘foo’” instead instruct it to “add a ‘foo’ parameter”. My AlphaWave project does this and I literally have not seen a bogus JSON response from any model call in several thousand requests.
OpenAI isn’t in a position to fix this core reliability issue because it requires multiple model calls. They don’t have an issue charging you for hidden tokens in your requests but additional model calls is a completely different story. And that’s what it takes… You need to get the model to correct its mistakes which can require multiple model calls for the same prompt. How would they bill for that???