Usage stats now available when using streaming with the Chat Completions API or Completions API

Usage is not currently properly working for curl request when streaming. More than 9/10 times, all chunks have usage: null without a single chunk provide usage token info.

For Azure OpenAI users: in the latest preview API version (2024-08-01-preview), this new feature is supported. You just need to have:

"stream": true,
 "stream_options": {
    "include_usage": true
}

in your http request body, similar to what we have in OpenAI’s API.
However, due to unknown reasons, Azure didn’t mention this in the release note or API specification. Therefore, it’s super hard to notice this update. I leave a reply here, hopefully it can help someone in the end.

P.S. I happen to find this when I search for “include_usage” in a file called AzureOpenAI/inference/preview/2024-08-01-preview/inference.json (somewhere in the azure’s official Github repository).

You can easily confirm this by sending such a request via postman. However, I don’t know if Azure OpenAI’s official library supports this or not.

1 Like

Thanks for the heads-up!

I can confirm this works with the openai Python library:

import openai

client = openai.AzureOpenAI()
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me a joke"}],
    stream=True,
    stream_options={
        "include_usage": True
    }
)
for chunk in stream:
    print(chunk.usage)  # prints CompletionUsage for last chunk
1 Like