Streaming responses and tool calls are both neat features on their own, but I’m noticing some weirdness when I try to employ both at the same time. For one thing it appears tool_calls are generated token by token, just like context. I guess that should not be a surprise, but it does complicate reassembling a well formed tool_calls for invocation by the shell script.
It also seems to me like tool_calls might be more inclined to produce hallucinations than the old function_call, especially while streaming. For example I have repeatedly observed the LM calling a presumably hallucinated “python” tool, and then writing little code blocks to try to test out my tools or leave comments to itself about what it thinks my intentions were.
Does anyone have any tips for getting these features to play nicely together? I’ll post a follow up if I figure out any good tricks.