Guide on converting ChatCompletionChunk to content parts?

stephenasuncion · December 3, 2025, 5:40am

So we can do this easily with the Responses API, but I’m also using models from other LLM providers and accessing them through the OpenAI SDK, so I need to use chat.completions as well. Is there any guide on this?

I do have a working version, but I’d like to see how other people are doing it.

_j · December 3, 2025, 1:18pm

pre: choice_celta is an unlikely name.

Chat completions does not emit typed application “events” on top of SSE.

It simply has a stream of deltas that need to be collected (or displayed) by an iterator as message.content, and tool calls that need to be accumulated back from the partial objects. So what you write is more like “collector” than “event handler”. Not a “deal with any stream from the SDK”, as that would have challenges.

I don’t have any non-proprietary example to readily share. You should target Gemini Chat Completions compatibility layer for feature resilience, as they “solved” encrypted reasoning and reasoning in tool calls, along with passing thinking summaries and incremental token usage reports. OpenAI’s lock-in attempt with Responses to gate model features will ultimately be their lock-out if they don’t follow.

Example shape of SSE chunk 0:

data: {"choices":[{"delta":{"content":"Sure!","extra_content":{"google":{"thought_signature":"CtkeAdHtim9..."}},"role":"assistant"},"index":0}],"created":1764000000,"id":"1315156","model":"gemini-x","object":"chat.completion.chunk","usage":{"completion_tokens":64,"prompt_tokens":180,"total_tokens":1180}}

Tip: “content” and “tool_calls” are not exclusive output modalities; the AI may produce both, needing action.

stephenasuncion · December 3, 2025, 5:30pm

Yea haha its very annoying that content and tool calls can be in the same chunk, but some llm providers don’t do it.

_j · December 3, 2025, 6:29pm

Rather, what I meant to imply is that the unaware programmer might expect, “This has text content, I do not need to parse for tool_calls”. Whereas the AI has been able to emit both (now called a preamble) for over two years.

What differs is that some may stream delta chunks that are maximally the size of single tokens or character representation, while others may not purposefully try to lose mini packets with maximum bandwidth overhead, and instead will stream sentences or whole tool calls, sized for http streaming chunk size or max packet size. (worse: Responses has figured out how to send the same content multiple times and even echo back inputs and instruction messages).

stephenasuncion · December 3, 2025, 6:47pm

Yep, makes sense. I think I have something that works, need to do more testing. You’re right… I’ve noticed Gemini and Mistral sometimes send the entire content along with the entire tool info.

I didn’t know responses do that . For claude, when you added thinking before tool_use sometimes it echos back the reasoning as text part.

It’s also strange that thought_signature exists even when no reasoning effort was applied, but maybe Gemini uses a default setting if it’s a reasoning model. From what I’m seeing, it looks like the 2.5 models are the only ones where reasoning can be turned off.

Topic		Replies	Views
Streaming chunk format with multiple choices or tool calls? API streaming	4	307	August 19, 2025
Auto tool call streaming differentiation is unintuitive Feedback api	3	197	May 19, 2025
Streaming with tools, only first chunk has the ID Bugs	1	810	December 16, 2023
"chat" wrt chat/completions API	4	478	June 25, 2024
Why there is no USAGE object returned with Streaming Api Call? API api , chat-completion , completions	20	5849	February 20, 2025

Guide on converting ChatCompletionChunk to content parts?

Related topics