Streaming structured output tool calls with additional message content

Hi all,

In my experiments I’ve found that it’s possible to stream structured output tool calls. For example, I’ll make a call like this (swift):

let requestBody = OpenAIChatCompletionRequestBody(
    model: "gpt-4o",
    messages: [
        .user(content: .text("What is the temp in SF?"))
    ],
    tools: [
        .function(
            name: "get_weather",
            description: "Call this when the user wants the weather",
            parameters: <snip>,
            strict: true)
    ]
)

let stream = try await openAIService.streamingChatCompletionRequest(body: requestBody)

and gpt-4o happily complies by chunking up the tool call response:

data: {"id":"chatcmpl-AgkguiqZK56VKmLZjK8lYLnOvJRiz","object":"chat.completion.chunk","created":1734752620,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_d28bcae782","choices":[{"index":0,"delta":{"role":"assistant","content":null,"tool_calls":[{"index":0,"id":"call_6wxwI1axC8cN5IXuV8NYsy94","type":"function","function":{"name":"get_weather","arguments":""}}],"refusal":null},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-AgkguiqZK56VKmLZjK8lYLnOvJRiz","object":"chat.completion.chunk","created":1734752620,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_d28bcae782","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\""}}]},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-AgkguiqZK56VKmLZjK8lYLnOvJRiz","object":"chat.completion.chunk","created":1734752620,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_d28bcae782","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"location"}}]},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-AgkguiqZK56VKmLZjK8lYLnOvJRiz","object":"chat.completion.chunk","created":1734752620,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_d28bcae782","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\":\""}}]},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-AgkguiqZK56VKmLZjK8lYLnOvJRiz","object":"chat.completion.chunk","created":1734752620,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_d28bcae782","choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"San"}}]},"logprobs":null,"finish_reason":null}],"usage":null}

<snip>

So that is all fine and good. Now, the behavior that I didn’t expect is it seems I can’t get chat delta content (e.g. non-toolcall content) in the same streaming response. Switching only the messages portion of the call above to:

  .system(content: .text("Greet the user and make tool calls when appropriate")),
  .user(content: .text("Say hello world and then tell me the temp in SF")),

I receive no content in the choices.delta.content key of the chunked responses. E.g. the chunked responses look identical to what I pasted in above. Now, If I remove part of the prompt:

  .system(content: .text("Greet the user and make tool calls when appropriate")),
  .user(content: .text("Say hello world")),

Then the chunked responses look like this (choices.delta.content is populated):

data: {"id":"chatcmpl-Agks2cMellaZ1OzRefyr25rvfVXN4","object":"chat.completion.chunk","created":1734753310,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_5f20662549","choices":[{"index":0,"delta":{"role":"assistant","content":"","refusal":null},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-Agks2cMellaZ1OzRefyr25rvfVXN4","object":"chat.completion.chunk","created":1734753310,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_5f20662549","choices":[{"index":0,"delta":{"content":"Hello"},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-Agks2cMellaZ1OzRefyr25rvfVXN4","object":"chat.completion.chunk","created":1734753310,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_5f20662549","choices":[{"index":0,"delta":{"content":","},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-Agks2cMellaZ1OzRefyr25rvfVXN4","object":"chat.completion.chunk","created":1734753310,"model":"gpt-4o-2024-08-06","system_fingerprint":"fp_5f20662549","choices":[{"index":0,"delta":{"content":" world"},"logprobs":null,"finish_reason":null}],"usage":null}

Is this a known limitation of structured outputs tool calls? Can we not get message content in the same response as function call content?

Thanks!
Lou

I wasn’t able to do so, but it should be possible to get audio and use whisper on it to get it to text format (chatgpt seems to save the audio, so that’s why i think its possible).

However, in my attempts, all I got was binary data. Gave up on it afterwards.

If you figure it out, please let me know, or if someone else has, please ping me with a message.

The hackiest way to bruteforce this is to “record” what is being spoken and then “whisper” it. But that is such an ugly solution that I refused to carry on with it.

Hi anon, I believe you responded to the wrong post. Either that or OpenAI’s forum software is wonky :slight_smile:

Oh sorry, I got confused with the tool calling part as I have a similar issue on the realtime endpoint + function calling.

Is this a known limitation of structured outputs tool calls? Can we not get message content in the same response as function call content?

I wasn’t able to to do this. I faced something similar with audio + function calling. I haven’t used text+function calling so I’m afraid I won’t be able to help you much on this.

1 Like

For now I’ll assume that a structured output tool call’s response populates either choices.delta.content or choices.delta.tool_calls, but not both.

Would love to hear if others can corroborate this!

The primary objective of structured outputs is to adhere to the JSON schema specification. This is accomplished by constraining the sample space of tokens, based on the specific part of the schema that the tokens are being generated for.

In scenarios where you desire the model to also generate a plaintext message to the user in addition to the structured output, it would be beneficial to include an additional parameter, such as message, thinking, or reasoning, of type string. Then have the model generate for this parameter aligning with your expected output within the content attribute.

Here’s an example from the same blog for showing reasoning step for solving a mathematical problem.