Hi all,
In my experiments I’ve found that it’s possible to stream structured output tool calls. For example, I’ll make a call like this (swift):
let requestBody = OpenAIChatCompletionRequestBody(
model: "gpt-4o",
messages: [
.user(content: .text("What is the temp in SF?"))
],
tools: [
.function(
name: "get_weather",
description: "Call this when the user wants the weather",
parameters: <snip>,
strict: true)
]
)
let stream = try await openAIService.streamingChatCompletionRequest(body: requestBody)
and gpt-4o
happily complies by chunking up the tool call response:
data: {"id":"chatcmpl-AgkguiqZK56VKmLZjK8lYLnOvJRiz","object":"chat.completion.chunk","created":1734752620,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_d28bcae782","choices":[{"index":0,"delta":{"role":"assistant","content":null,"tool_calls":[{"index":0,"id":"call_6wxwI1axC8cN5IXuV8NYsy94","type":"function","function":{"name":"get_weather","arguments":""}}],"refusal":null},"logprobs":null,"finish_reason":null}],"usage":null}
data: {"id":"chatcmpl-AgkguiqZK56VKmLZjK8lYLnOvJRiz","object":"chat.completion.chunk","created":1734752620,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_d28bcae782","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\""}}]},"logprobs":null,"finish_reason":null}],"usage":null}
data: {"id":"chatcmpl-AgkguiqZK56VKmLZjK8lYLnOvJRiz","object":"chat.completion.chunk","created":1734752620,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_d28bcae782","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"location"}}]},"logprobs":null,"finish_reason":null}],"usage":null}
data: {"id":"chatcmpl-AgkguiqZK56VKmLZjK8lYLnOvJRiz","object":"chat.completion.chunk","created":1734752620,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_d28bcae782","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\":\""}}]},"logprobs":null,"finish_reason":null}],"usage":null}
data: {"id":"chatcmpl-AgkguiqZK56VKmLZjK8lYLnOvJRiz","object":"chat.completion.chunk","created":1734752620,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_d28bcae782","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"San"}}]},"logprobs":null,"finish_reason":null}],"usage":null}
<snip>
So that is all fine and good. Now, the behavior that I didn’t expect is it seems I can’t get chat delta content (e.g. non-toolcall content) in the same streaming response. Switching only the messages
portion of the call above to:
.system(content: .text("Greet the user and make tool calls when appropriate")),
.user(content: .text("Say hello world and then tell me the temp in SF")),
I receive no content in the choices.delta.content
key of the chunked responses. E.g. the chunked responses look identical to what I pasted in above. Now, If I remove part of the prompt:
.system(content: .text("Greet the user and make tool calls when appropriate")),
.user(content: .text("Say hello world")),
Then the chunked responses look like this (choices.delta.content
is populated):
data: {"id":"chatcmpl-Agks2cMellaZ1OzRefyr25rvfVXN4","object":"chat.completion.chunk","created":1734753310,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_5f20662549","choices":[{"index":0,"delta":{"role":"assistant","content":"","refusal":null},"logprobs":null,"finish_reason":null}],"usage":null}
data: {"id":"chatcmpl-Agks2cMellaZ1OzRefyr25rvfVXN4","object":"chat.completion.chunk","created":1734753310,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_5f20662549","choices":[{"index":0,"delta":{"content":"Hello"},"logprobs":null,"finish_reason":null}],"usage":null}
data: {"id":"chatcmpl-Agks2cMellaZ1OzRefyr25rvfVXN4","object":"chat.completion.chunk","created":1734753310,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_5f20662549","choices":[{"index":0,"delta":{"content":","},"logprobs":null,"finish_reason":null}],"usage":null}
data: {"id":"chatcmpl-Agks2cMellaZ1OzRefyr25rvfVXN4","object":"chat.completion.chunk","created":1734753310,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_5f20662549","choices":[{"index":0,"delta":{"content":" world"},"logprobs":null,"finish_reason":null}],"usage":null}
Is this a known limitation of structured outputs tool calls? Can we not get message content in the same response as function call content?
Thanks!
Lou