Issue with Token Usage in Streaming Responses

I’m encountering an issue with obtaining token usage information when streaming responses from the OpenAI API. According to the Api Docs,token usage should be included in the response chunks when using the stream_options parameter.

Here’s my setup:

API Version: openai==1.38.0

  • Python Version: 3.11.3

I’ve tried using both asynchronous and synchronous OpenAI client configurations:

from openai import AsyncOpenAI
from openai import OpenAI

#client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
#and
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
config:

response = client.chat.completions.create(
    model="gpt-4o-mini",#also with 3.5 and 4o
    messages=messages,
    stream=True,
    temperature=0.5,
    tool_choice="auto",
    tools=gpt_tools,
    max_tokens=300,
    stream_options={"include_usage": True}
)

for chunk in response:
    print(chunk.usage)

Output

ChatCompletionChunk(id='chatcmpl-9sBbvJTm6vBbDls3RUcInszCd2kqj', choices=[Choice(delta=ChoiceDelta(content=' today', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1722701371, model='gpt-3.5-turbo-0125', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
ChatCompletionChunk(id='chatcmpl-9sBbvJTm6vBbDls3RUcInszCd2kqj', choices=[Choice(delta=ChoiceDelta(content='?', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1722701371, model='gpt-3.5-turbo-0125', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
ChatCompletionChunk(id='chatcmpl-9sBbvJTm6vBbDls3RUcInszCd2kqj', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1722701371, model='gpt-3.5-turbo-0125', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)

Formatted Output (Only Usage)

None
None
None
None
None
None
None
None
None
None
None

I’ve tested with different models (gpt-4o-mini, gpt-3.5) and both async and non-async configurations, but the usage field always returns None in the response chunks.

Has anyone else experienced this issue or found a solution? Any insights or suggestions would be greatly appreciated!

Thanks in advance!

1 Like

Did you present this issue to ChatGPT? I am not a Coder at all! So when I try to write code and it does not work, I ask ChatGPT and it spits out the resolution pronto!

I thought I had the same issue but I got it to work fine in JS:
request.stream_options={}
request.stream_options.include_usage = true

I already have the option to true

You first set the option as None and add it after?

Hi @conciergeai

Usage stats are being returned. Here’s how:

(key) include_usage: bool
If set, an additional chunk will be streamed before the data: [DONE] message.

The usage field on this chunk shows the token usage statistics for the entire request, and the choices field will always be an empty array. All other chunks will also include a usage field, but with a null value.

Here’s some code to get you started:

from openai import OpenAI

client = OpenAI()
prompt = "Tell me a dad-joke"

response = client.chat.completions.create(
    model="chatgpt-4o-latest",
    messages=[{"role":"user", "content": prompt}],
    stream=True,
    temperature=0.5,
    stream_options={"include_usage": True}
)

for chunk in response:
    if chunk.choices:
        if chunk.choices[0].delta.content is not None:
            print(chunk.choices[0].delta.content, end="")
    
    # Handle the case where choices is empty but usage data is present
    elif chunk.usage:
        print("\n\n", chunk.usage)

I’m printing all the Chunks no matter the status, And all chunk.usage, is None

1 Like

Here is my code snippet:

client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))


async def chat_completion(websocket,listen_task,call:Call,solutions:Solutions,llm_model,client_tools):
    try:

        #Apend client tools into llm tools
        all_tools = []
        for doc in client_tools:
            all_tools.append(doc.get("tool",{}))

        # Initialize default tools for the session
        gpt_tools = default_gpt_tools.copy()

        # Convert tools to respective formats and append to default tools
        gpt_tools.extend(convert_to_gpt_format(all_tools))

        response = await client.chat.completions.create(
            model=llm_model,
            messages=call.messages,
            stream=True,
            temperature=0.5,
            tool_choice="auto",
            tools=gpt_tools,
            max_tokens=300,
            stream_options={"include_usage": True}
            )

async def text_iterator():
            nonlocal call
            full_resp = ""
            arguments_buffer = ""
            tool_name = None
            async for chunk in response:
                print(chunk)
                if chunk.choices:
                    content = chunk.choices[0].delta.content
                    tool_call = chunk.choices[0].delta.tool_calls
                    finish_reason = chunk.choices[0].finish_reason
                    # if there is content
                    if content is not None:#when content is not a tool
                        full_resp += content
                        #print(content,flush=True,end="")
                        yield content
                    #if call is a tool
                    elif tool_call:
                        for chat_call in tool_call:#loop in the tools
                            if chat_call.function.name:#get tool name
                                tool_name = chat_call.function.name
                                full_resp += f"Just Used: {tool_name}"
                            if chat_call.function.arguments:#get arguments into json-string
                                arguments_buffer += chat_call.function.arguments
                    else:
                        if tool_name:
                            

                            #get tool URL
                            tool_arguments = await get_tool_arguments(tools_data=client_tools,tool_name=tool_name)
                            #gets last 3 messages from mesasges list
                            last_messages_str = await get_last_messages(message_list=call.messages)

                            tool_response, category = await agent.resolve_function(tool_name=tool_name,arguments=arguments_buffer,websocket=websocket,language=call.lang,university_id=call.university_id,student_data=call.student_data,tool_arguments=tool_arguments,last_messages=last_messages_str)
                            yield tool_response
                            call.category = category
                        if finish_reason:
                            #print("end of response")
                            break
                elif chunk.usage:
                    print("\n\n", chunk.usage)
                        


it prints:

ChatCompletionChunk(id='chatcmpl-9z6MQoBIMgPrfeNhVtZr1fIYhw6Gm', choices=[Choice(delta=ChoiceDelta(content='', function_call=None, refusal=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1724349486, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_d794a2177f', usage=None)
ChatCompletionChunk(id='chatcmpl-9z6MQoBIMgPrfeNhVtZr1fIYhw6Gm', choices=[Choice(delta=ChoiceDelta(content="You're", function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1724349486, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_d794a2177f', usage=None)
ChatCompletionChunk(id='chatcmpl-9z6MQoBIMgPrfeNhVtZr1fIYhw6Gm', choices=[Choice(delta=ChoiceDelta(content=' welcome', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1724349486, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_d794a2177f', usage=None)
ChatCompletionChunk(id='chatcmpl-9z6MQoBIMgPrfeNhVtZr1fIYhw6Gm', choices=[Choice(delta=ChoiceDelta(content='!', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1724349486, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_d794a2177f', usage=None)
ChatCompletionChunk(id='chatcmpl-9z6MQoBIMgPrfeNhVtZr1fIYhw6Gm', choices=[Choice(delta=ChoiceDelta(content=' How', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1724349486, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_d794a2177f', usage=None)
ChatCompletionChunk(id='chatcmpl-9z6MQoBIMgPrfeNhVtZr1fIYhw6Gm', choices=[Choice(delta=ChoiceDelta(content=' can', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1724349486, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_d794a2177f', usage=None)
ChatCompletionChunk(id='chatcmpl-9z6MQoBIMgPrfeNhVtZr1fIYhw6Gm', choices=[Choice(delta=ChoiceDelta(content=' I', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1724349486, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_d794a2177f', usage=None)
ChatCompletionChunk(id='chatcmpl-9z6MQoBIMgPrfeNhVtZr1fIYhw6Gm', choices=[Choice(delta=ChoiceDelta(content=' assist', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1724349486, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_d794a2177f', usage=None)
ChatCompletionChunk(id='chatcmpl-9z6MQoBIMgPrfeNhVtZr1fIYhw6Gm', choices=[Choice(delta=ChoiceDelta(content=' you', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1724349486, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_d794a2177f', usage=None)
ChatCompletionChunk(id='chatcmpl-9z6MQoBIMgPrfeNhVtZr1fIYhw6Gm', choices=[Choice(delta=ChoiceDelta(content=' today', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1724349486, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_d794a2177f', usage=None)
ChatCompletionChunk(id='chatcmpl-9z6MQoBIMgPrfeNhVtZr1fIYhw6Gm', choices=[Choice(delta=ChoiceDelta(content='?', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1724349486, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_d794a2177f', usage=None)
ChatCompletionChunk(id='chatcmpl-9z6MQoBIMgPrfeNhVtZr1fIYhw6Gm', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1724349486, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_d794a2177f', usage=None)

So usage is always None

1 Like

Hi, any update on this? I am experiencing the same problem and no one seems to give feedback about it.

1 Like

Does this setting take time to propagate through their servers or something?

I, like others that have commented after your solution here, have done what you’ve posted here - yet every usage is null.

  • I changed the setting in my call
    image
  • I’ve made the call a dozen or more times, using different prompts etc.
  • Every single one comes back with null usages for every chunk
    image

It’s like there’s some other setting or something that’s playing into this, and it’s not as simple as including the streaming-usage true flag.

What are we missing?

TYIA

Hi @multitechvisions

The usage is only sent in the second-to-last chunk, just before data: [DONE], where choices is an empty array. In the rest of the chunks, it will always be a null value.

It’s really as simple as setting the "stream": true and "stream_options" to { "include_usage": true }.
Here’s a cURL call for you to test it:

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ],
    "stream": true,
    "stream_options": {
      "include_usage": true
    }
  }'

Here’s the output that I got:

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"role":"assistant","content":"","refusal":null},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":"Hi"},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" there"},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":"!"},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" How"},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" can"},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" I"},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" assist"},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" you"},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" today"},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":"?"},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}],"usage":null}

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[],"usage":{"prompt_tokens":19,"completion_tokens":10,"total_tokens":29}}

data: [DONE]

Pay close attention to the second-to-last chunk’s usage value:

1 Like

Same happens for me on curl request. Not getting any chunk containing usage info