Issue with Token Usage in Streaming Responses

I’m encountering an issue with obtaining token usage information when streaming responses from the OpenAI API. According to the Api Docs,token usage should be included in the response chunks when using the stream_options parameter.

Here’s my setup:

API Version: openai==1.38.0

  • Python Version: 3.11.3

I’ve tried using both asynchronous and synchronous OpenAI client configurations:

from openai import AsyncOpenAI
from openai import OpenAI

#client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
#and
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
config:

response = client.chat.completions.create(
    model="gpt-4o-mini",#also with 3.5 and 4o
    messages=messages,
    stream=True,
    temperature=0.5,
    tool_choice="auto",
    tools=gpt_tools,
    max_tokens=300,
    stream_options={"include_usage": True}
)

for chunk in response:
    print(chunk.usage)

Output

ChatCompletionChunk(id='chatcmpl-9sBbvJTm6vBbDls3RUcInszCd2kqj', choices=[Choice(delta=ChoiceDelta(content=' today', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1722701371, model='gpt-3.5-turbo-0125', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
ChatCompletionChunk(id='chatcmpl-9sBbvJTm6vBbDls3RUcInszCd2kqj', choices=[Choice(delta=ChoiceDelta(content='?', function_call=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1722701371, model='gpt-3.5-turbo-0125', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
ChatCompletionChunk(id='chatcmpl-9sBbvJTm6vBbDls3RUcInszCd2kqj', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1722701371, model='gpt-3.5-turbo-0125', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)

Formatted Output (Only Usage)

None
None
None
None
None
None
None
None
None
None
None

I’ve tested with different models (gpt-4o-mini, gpt-3.5) and both async and non-async configurations, but the usage field always returns None in the response chunks.

Has anyone else experienced this issue or found a solution? Any insights or suggestions would be greatly appreciated!

Thanks in advance!

1 Like

Did you present this issue to ChatGPT? I am not a Coder at all! So when I try to write code and it does not work, I ask ChatGPT and it spits out the resolution pronto!

I thought I had the same issue but I got it to work fine in JS:
request.stream_options={}
request.stream_options.include_usage = true

I already have the option to true

You first set the option as None and add it after?

Hi @conciergeai

Usage stats are being returned. Here’s how:

(key) include_usage: bool
If set, an additional chunk will be streamed before the data: [DONE] message.

The usage field on this chunk shows the token usage statistics for the entire request, and the choices field will always be an empty array. All other chunks will also include a usage field, but with a null value.

Here’s some code to get you started:

from openai import OpenAI

client = OpenAI()
prompt = "Tell me a dad-joke"

response = client.chat.completions.create(
    model="chatgpt-4o-latest",
    messages=[{"role":"user", "content": prompt}],
    stream=True,
    temperature=0.5,
    stream_options={"include_usage": True}
)

for chunk in response:
    if chunk.choices:
        if chunk.choices[0].delta.content is not None:
            print(chunk.choices[0].delta.content, end="")
    
    # Handle the case where choices is empty but usage data is present
    elif chunk.usage:
        print("\n\n", chunk.usage)

I’m printing all the Chunks no matter the status, And all chunk.usage, is None

1 Like

Here is my code snippet:

client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))


async def chat_completion(websocket,listen_task,call:Call,solutions:Solutions,llm_model,client_tools):
    try:

        #Apend client tools into llm tools
        all_tools = []
        for doc in client_tools:
            all_tools.append(doc.get("tool",{}))

        # Initialize default tools for the session
        gpt_tools = default_gpt_tools.copy()

        # Convert tools to respective formats and append to default tools
        gpt_tools.extend(convert_to_gpt_format(all_tools))

        response = await client.chat.completions.create(
            model=llm_model,
            messages=call.messages,
            stream=True,
            temperature=0.5,
            tool_choice="auto",
            tools=gpt_tools,
            max_tokens=300,
            stream_options={"include_usage": True}
            )

async def text_iterator():
            nonlocal call
            full_resp = ""
            arguments_buffer = ""
            tool_name = None
            async for chunk in response:
                print(chunk)
                if chunk.choices:
                    content = chunk.choices[0].delta.content
                    tool_call = chunk.choices[0].delta.tool_calls
                    finish_reason = chunk.choices[0].finish_reason
                    # if there is content
                    if content is not None:#when content is not a tool
                        full_resp += content
                        #print(content,flush=True,end="")
                        yield content
                    #if call is a tool
                    elif tool_call:
                        for chat_call in tool_call:#loop in the tools
                            if chat_call.function.name:#get tool name
                                tool_name = chat_call.function.name
                                full_resp += f"Just Used: {tool_name}"
                            if chat_call.function.arguments:#get arguments into json-string
                                arguments_buffer += chat_call.function.arguments
                    else:
                        if tool_name:
                            

                            #get tool URL
                            tool_arguments = await get_tool_arguments(tools_data=client_tools,tool_name=tool_name)
                            #gets last 3 messages from mesasges list
                            last_messages_str = await get_last_messages(message_list=call.messages)

                            tool_response, category = await agent.resolve_function(tool_name=tool_name,arguments=arguments_buffer,websocket=websocket,language=call.lang,university_id=call.university_id,student_data=call.student_data,tool_arguments=tool_arguments,last_messages=last_messages_str)
                            yield tool_response
                            call.category = category
                        if finish_reason:
                            #print("end of response")
                            break
                elif chunk.usage:
                    print("\n\n", chunk.usage)
                        


it prints:

ChatCompletionChunk(id='chatcmpl-9z6MQoBIMgPrfeNhVtZr1fIYhw6Gm', choices=[Choice(delta=ChoiceDelta(content='', function_call=None, refusal=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1724349486, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_d794a2177f', usage=None)
ChatCompletionChunk(id='chatcmpl-9z6MQoBIMgPrfeNhVtZr1fIYhw6Gm', choices=[Choice(delta=ChoiceDelta(content="You're", function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1724349486, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_d794a2177f', usage=None)
ChatCompletionChunk(id='chatcmpl-9z6MQoBIMgPrfeNhVtZr1fIYhw6Gm', choices=[Choice(delta=ChoiceDelta(content=' welcome', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1724349486, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_d794a2177f', usage=None)
ChatCompletionChunk(id='chatcmpl-9z6MQoBIMgPrfeNhVtZr1fIYhw6Gm', choices=[Choice(delta=ChoiceDelta(content='!', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1724349486, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_d794a2177f', usage=None)
ChatCompletionChunk(id='chatcmpl-9z6MQoBIMgPrfeNhVtZr1fIYhw6Gm', choices=[Choice(delta=ChoiceDelta(content=' How', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1724349486, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_d794a2177f', usage=None)
ChatCompletionChunk(id='chatcmpl-9z6MQoBIMgPrfeNhVtZr1fIYhw6Gm', choices=[Choice(delta=ChoiceDelta(content=' can', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1724349486, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_d794a2177f', usage=None)
ChatCompletionChunk(id='chatcmpl-9z6MQoBIMgPrfeNhVtZr1fIYhw6Gm', choices=[Choice(delta=ChoiceDelta(content=' I', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1724349486, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_d794a2177f', usage=None)
ChatCompletionChunk(id='chatcmpl-9z6MQoBIMgPrfeNhVtZr1fIYhw6Gm', choices=[Choice(delta=ChoiceDelta(content=' assist', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1724349486, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_d794a2177f', usage=None)
ChatCompletionChunk(id='chatcmpl-9z6MQoBIMgPrfeNhVtZr1fIYhw6Gm', choices=[Choice(delta=ChoiceDelta(content=' you', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1724349486, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_d794a2177f', usage=None)
ChatCompletionChunk(id='chatcmpl-9z6MQoBIMgPrfeNhVtZr1fIYhw6Gm', choices=[Choice(delta=ChoiceDelta(content=' today', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1724349486, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_d794a2177f', usage=None)
ChatCompletionChunk(id='chatcmpl-9z6MQoBIMgPrfeNhVtZr1fIYhw6Gm', choices=[Choice(delta=ChoiceDelta(content='?', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1724349486, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_d794a2177f', usage=None)
ChatCompletionChunk(id='chatcmpl-9z6MQoBIMgPrfeNhVtZr1fIYhw6Gm', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1724349486, model='gpt-4o-2024-08-06', object='chat.completion.chunk', service_tier=None, system_fingerprint='fp_d794a2177f', usage=None)

So usage is always None

1 Like

Hi, any update on this? I am experiencing the same problem and no one seems to give feedback about it.

1 Like

Does this setting take time to propagate through their servers or something?

I, like others that have commented after your solution here, have done what you’ve posted here - yet every usage is null.

  • I changed the setting in my call
    image
  • I’ve made the call a dozen or more times, using different prompts etc.
  • Every single one comes back with null usages for every chunk
    image

It’s like there’s some other setting or something that’s playing into this, and it’s not as simple as including the streaming-usage true flag.

What are we missing?

TYIA

Hi @multitechvisions

The usage is only sent in the second-to-last chunk, just before data: [DONE], where choices is an empty array. In the rest of the chunks, it will always be a null value.

It’s really as simple as setting the "stream": true and "stream_options" to { "include_usage": true }.
Here’s a cURL call for you to test it:

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ],
    "stream": true,
    "stream_options": {
      "include_usage": true
    }
  }'

Here’s the output that I got:

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"role":"assistant","content":"","refusal":null},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":"Hi"},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" there"},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":"!"},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" How"},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" can"},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" I"},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" assist"},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" you"},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":" today"},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{"content":"?"},"logprobs":null,"finish_reason":null}],"usage":null}

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}],"usage":null}

data: {"id":"chatcmpl-xxxxxxxxxxxxx8ZVFOjWthz6MHp","object":"chat.completion.chunk","created":1725882908,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_25624ae3a5","choices":[],"usage":{"prompt_tokens":19,"completion_tokens":10,"total_tokens":29}}

data: [DONE]

Pay close attention to the second-to-last chunk’s usage value:

2 Likes

Same happens for me on curl request. Not getting any chunk containing usage info

1 Like

Thank you for your reply, it’s greatly appreciated.

I’m at a complete loss here, I apologize if I’m missing something basic.

Here is my code for this particular part:

You can see I’m logging & checking to see if there is anything inside usage…

  • When I run this, every usage is reported as null
    image
  • Not a single chunk had anything inside usage

And for reference: yes - stream usage is included in the call:
image

It’s like the server is ignoring the include usage flag…

  • But I can confirm, through logging the body contents just before we send it out, that all of those settings ARE indeed being sent.

That goes out… but no usage comes back.


@sps You say it’s as easy as adding the flag, and pulling whatever you find out of the usage object… yet it doesn’t work for me. ¯\_(ツ)_/¯

  • I must be missing something
  • There must be something else going on
  • Something NOT on the surface
  • Like a setting or something somewhere that’s turned off… and that’s the reason why this isn’t working
    • Because otherwise… it should be working. lol

Thank you again for your time and brain power!


Note: Continuing this discussion here, for those that land here in the future looking for a solution.

=====================================================

Figured it out :tada:

The problem was that I had a snippet of code that caught the stop signal

The reason this matters is that the usage chunk comes AFTER the stop signal!

  • Since I was stopping when we received the stop signal, I was not receiving the “2nd to last” chunk… which is the usage chunk.
    • Now that I’m cycling through any piece that piece.startsWith('data: {'), I’m now getting usage

@mainstreamstudios are you doing something similar with the stop signal?

1 Like

Hi there,

Today I met the same issue with the streaming API call.

Here’s is my attempt to use cURL but with a third party API:

curl https://api.***/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ],
    "stream": true,
    "stream_options": {
      "include_usage": true
    }
  }'

And here is the output:

tdata: {"id":"chatcmpl-A8vmHRzjdgbwTmE62d8J3xy0ECrPG","object":"chat.completion.chunk","created":1726692085,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_a5d11b2ef2","choices":[{"index":0,"delta":{"role":"assistant","content":"","refusal":null},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-A8vmHRzjdgbwTmE62d8J3xy0ECrPG","object":"chat.completion.chunk","created":1726692085,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_a5d11b2ef2","choices":[{"index":0,"delta":{"content":"Hello"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-A8vmHRzjdgbwTmE62d8J3xy0ECrPG","object":"chat.completion.chunk","created":1726692085,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_a5d11b2ef2","choices":[{"index":0,"delta":{"content":"!"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-A8vmHRzjdgbwTmE62d8J3xy0ECrPG","object":"chat.completion.chunk","created":1726692085,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_a5d11b2ef2","choices":[{"index":0,"delta":{"content":" How"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-A8vmHRzjdgbwTmE62d8J3xy0ECrPG","object":"chat.completion.chunk","created":1726692085,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_a5d11b2ef2","choices":[{"index":0,"delta":{"content":" can"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-A8vmHRzjdgbwTmE62d8J3xy0ECrPG","object":"chat.completion.chunk","created":1726692085,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_a5d11b2ef2","choices":[{"index":0,"delta":{"content":" I"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-A8vmHRzjdgbwTmE62d8J3xy0ECrPG","object":"chat.completion.chunk","created":1726692085,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_a5d11b2ef2","choices":[{"index":0,"delta":{"content":" assist"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-A8vmHRzjdgbwTmE62d8J3xy0ECrPG","object":"chat.completion.chunk","created":1726692085,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_a5d11b2ef2","choices":[{"index":0,"delta":{"content":" you"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-A8vmHRzjdgbwTmE62d8J3xy0ECrPG","object":"chat.completion.chunk","created":1726692085,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_a5d11b2ef2","choices":[{"index":0,"delta":{"content":" today"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-A8vmHRzjdgbwTmE62d8J3xy0ECrPG","object":"chat.completion.chunk","created":1726692085,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_a5d11b2ef2","choices":[{"index":0,"delta":{"content":"?"},"logprobs":null,"finish_reason":null}]}

data: {"id":"chatcmpl-A8vmHRzjdgbwTmE62d8J3xy0ECrPG","object":"chat.completion.chunk","created":1726692085,"model":"gpt-4o-2024-05-13","system_fingerprint":"fp_a5d11b2ef2","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}]}

data: [DONE]

From what I can see, there is even no usage property. Is it because I’m using a third-party API?
Thanks in advance for your support!

No, Still not working…
Seems openAi Does not really care about this

Alright I was having the same issue and as noted above the usage comes after the finish_reason: “stop”, but the choice array will be empty so your code might break at this point, you just need to ensure choice.delta exists before trying to destructure -

This is what finally worked for me and printed out the usage -

for await (const chunk of response) {
     console.log('Received chunk:', JSON.stringify(chunk, null, 2));
     console.log('usage - ', chunk.usage);
     const choice = chunk.choices[0]; 

     if (choice?.delta) {
       const { content } = choice.delta; 
       if (content) {
         ctx.res.write(content);
       }
     }
   }

This is correct, the documentation clearly states that the usage data comes in the last chunk, in other words, after the finish_reason: 'stop'.

My implementation: whenever I see the ‘stop’ finish reason, I create a stop flag that doesn’t stop execution but prevents the execution of any logic other than receiving one more chunk with the usage data.