Hi, you’ve got a good question, because it seems that ChatGPT will liberally write a whole bunch for you, and then finally say “let’s see that in code” and be able to invoke python code interpreter, for example.
When the gpt-x.x-0613 models were released, it was relatively easy to get them to similarly talk about things and then call a function in one API response. As evidence that it is not a “snapshot” but a moving target, though, that behavior of writing to you has gone away, especially with tools where some hard-and-fast mechanism makes sure if the AI wants to write a function, you’re getting nothing except that.
The way that streaming works, you can expect and should parse for any delta that you get to either be an assistant content or a tool_call or a function_call. The empty “None” entries are now always there, waiting for you to gather them up, and in the case of tool_calls, reassemble.
Yes, the model will behave differently when there are tools, which almost implicitly mean that you have parallel tool calling, and the extra context tokens of prompting specification of a multi-function wrapper to place multiple tools in.
There’s currently no way to specify both tool_call and function_call, and I expect that the API endpoint that parses the AI response (and lets nothing through that is not well-formed) is also looking for functions and placing them in just one of the API return keys.