Streaming structured output tool calls with additional message content

louzell · December 21, 2024, 3:59am

Hi all,

In my experiments I’ve found that it’s possible to stream structured output tool calls. For example, I’ll make a call like this (swift):

let requestBody = OpenAIChatCompletionRequestBody(
    model: "gpt-4o",
    messages: [
        .user(content: .text("What is the temp in SF?"))
    ],
    tools: [
        .function(
            name: "get_weather",
            description: "Call this when the user wants the weather",
            parameters: <snip>,
            strict: true)
    ]
)

let stream = try await openAIService.streamingChatCompletionRequest(body: requestBody)

and gpt-4o happily complies by chunking up the tool call response:

data: {"id":"chatcmpl-AgkguiqZK56VKmLZjK8lYLnOvJRiz","object":"chat.completion.chunk","created":1734752620,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_d28bcae782","choices":[{"index":0,"delta":{"role":"assistant","content":null,"tool_calls":[{"index":0,"id":"call_6wxwI1axC8cN5IXuV8NYsy94","type":"function","function":{"name":"get_weather","arguments":""}}],"refusal":null},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-AgkguiqZK56VKmLZjK8lYLnOvJRiz","object":"chat.completion.chunk","created":1734752620,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_d28bcae782","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\""}}]},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-AgkguiqZK56VKmLZjK8lYLnOvJRiz","object":"chat.completion.chunk","created":1734752620,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_d28bcae782","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"location"}}]},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-AgkguiqZK56VKmLZjK8lYLnOvJRiz","object":"chat.completion.chunk","created":1734752620,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_d28bcae782","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\":\""}}]},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-AgkguiqZK56VKmLZjK8lYLnOvJRiz","object":"chat.completion.chunk","created":1734752620,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_d28bcae782","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"San"}}]},"logprobs":null,"finish_reason":null}],"usage":null}

<snip>

So that is all fine and good. Now, the behavior that I didn’t expect is it seems I can’t get chat delta content (e.g. non-toolcall content) in the same streaming response. Switching only the messages portion of the call above to:

  .system(content: .text("Greet the user and make tool calls when appropriate")),
  .user(content: .text("Say hello world and then tell me the temp in SF")),

I receive no content in the choices.delta.content key of the chunked responses. E.g. the chunked responses look identical to what I pasted in above. Now, If I remove part of the prompt:

  .system(content: .text("Greet the user and make tool calls when appropriate")),
  .user(content: .text("Say hello world")),

Then the chunked responses look like this (choices.delta.content is populated):

data: {"id":"chatcmpl-Agks2cMellaZ1OzRefyr25rvfVXN4","object":"chat.completion.chunk","created":1734753310,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_5f20662549","choices":[{"index":0,"delta":{"role":"assistant","content":"","refusal":null},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-Agks2cMellaZ1OzRefyr25rvfVXN4","object":"chat.completion.chunk","created":1734753310,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_5f20662549","choices":[{"index":0,"delta":{"content":"Hello"},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-Agks2cMellaZ1OzRefyr25rvfVXN4","object":"chat.completion.chunk","created":1734753310,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_5f20662549","choices":[{"index":0,"delta":{"content":","},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-Agks2cMellaZ1OzRefyr25rvfVXN4","object":"chat.completion.chunk","created":1734753310,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_5f20662549","choices":[{"index":0,"delta":{"content":" world"},"logprobs":null,"finish_reason":null}],"usage":null}

Is this a known limitation of structured outputs tool calls? Can we not get message content in the same response as function call content?

Thanks!
Lou

anon37218972 · December 21, 2024, 4:01am

I wasn’t able to do so, but it should be possible to get audio and use whisper on it to get it to text format (chatgpt seems to save the audio, so that’s why i think its possible).

However, in my attempts, all I got was binary data. Gave up on it afterwards.

If you figure it out, please let me know, or if someone else has, please ping me with a message.

The hackiest way to bruteforce this is to “record” what is being spoken and then “whisper” it. But that is such an ugly solution that I refused to carry on with it.

louzell · December 21, 2024, 4:04am

Hi anon, I believe you responded to the wrong post. Either that or OpenAI’s forum software is wonky

anon37218972 · December 21, 2024, 4:07am

Oh sorry, I got confused with the tool calling part as I have a similar issue on the realtime endpoint + function calling.

Is this a known limitation of structured outputs tool calls? Can we not get message content in the same response as function call content?

I wasn’t able to to do this. I faced something similar with audio + function calling. I haven’t used text+function calling so I’m afraid I won’t be able to help you much on this.

louzell · December 24, 2024, 3:58pm

For now I’ll assume that a structured output tool call’s response populates either choices.delta.content or choices.delta.tool_calls, but not both.

Would love to hear if others can corroborate this!

sps · December 25, 2024, 10:14am

The primary objective of structured outputs is to adhere to the JSON schema specification. This is accomplished by constraining the sample space of tokens, based on the specific part of the schema that the tokens are being generated for.

In scenarios where you desire the model to also generate a plaintext message to the user in addition to the structured output, it would be beneficial to include an additional parameter, such as message, thinking, or reasoning, of type string. Then have the model generate for this parameter aligning with your expected output within the content attribute.

Here’s an example from the same blog for showing reasoning step for solving a mathematical problem.

louzell · December 26, 2024, 4:09pm

That makes sense! I will use that approach. Thank you @sps!

louzell · January 16, 2025, 2:42am

This looks like it was quietly fixed! The new function calling guide says:

Based on the system prompt and messages, the model may decide to call these functions — instead of (or in addition to) generating text or audio

And in my testing, changing the prompt of the structured output tool call to something that doesn’t resemble the defined function does indeed populate the choices.delta.content key in the response!

So it seems it is no longer necessary to use the trick of defining a second tool used for chatting.

Topic		Replies	Views
Getting a function call + textual response in the same call API gpt-4 , function-calling	9	1687	April 8, 2025
Function calling response format API api	12	1996	October 3, 2024
Is it possible to have tool_call and content in single completion message API gpt-4 , api	6	4209	May 15, 2024
How can I ensure every LLM reply includes exactly one message and one tool call? Prompting tools , gpt-41	9	372	June 20, 2025
How can I use function calling with response format (structured output feature) for final response? Feedback gpt-4 , assistants-api	11	5020	May 30, 2025

Streaming structured output tool calls with additional message content

Related topics