When streaming with the Chat Completions or Completions APIs you can now request an additional chunk to be streamed at the end that will contain the “usage stats” like the number of tokens generated in the entire completion. Previously this usage data was not available when using streaming.
Just set stream_options: {"include_usage": true}
(API reference) in your request and you will receive an additional final response chunk containing the usage data for your entire request / response.
Important:
- Note that this usage-specific chunk will have
choices: []
, so if you turn this feature on, you may need to update any code that accesseschoices[0]
to first check if it is the usage chunk. - Additionally, note that all the normal chunks that appear earlier in the response will contain usage: null
You can see an example of using this in our cookbook, and here is an illustrative example:
// Request: POST v1/chat/completions
{
"model": "gpt-4-turbo",
"messages": [
{"role": "user", "content": "Hi! How are you?"}
],
"stream": true,
// NEW: stream_options param enables the usage field and chunk.
"stream_options": {
"include_usage": true
}
}
// Streamed chunks in the response.
// Note that since we use Server-sent Events for streaming, we recommend using either our SDK or a library designed for Server-sent Events in the language you are using to parse these events.
{
"id": "chatcmpl-2EHCQqsRzdOlFskNehCMu2oOMTXhSjey",
"object": "chat.completion",
"created": 1693600000,
"model": "gpt-4-turbo",
"choices": [{
"index": 0,
"delta": {
"role": "assistant"
},
"finish_reason": null
}],
// NEW: since the initial request included `stream_options: {"include_usage": true}`, all streamed chunks will include `usage: null` (except the final one which will include the usage data for the entire completion)
"usage": null
}
{
"id": "chatcmpl-Z5gqKcSESta3pKtJz4tO8VZxw3yv9bmI",
"object": "chat.completion",
"created": 1693600020,
"model": "gpt-4-turbo",
"choices": [{
"index": 0,
"delta": {
"content": "I'm doing well thanks, how are you?"
},
"finish_reason": null
}],
"usage": null
}
{
"id": "chatcmpl-Qh9PxX34rkVI6eFgK2oL5yMuXaFb6yLJ",
"object": "chat.completion",
"created": 1693600040,
"model": "gpt-4-turbo",
"choices": [{
"index": 0,
"delta": {},
"finish_reason": "stop"
}],
"usage": null
}
// NEW: you will now receive a chunk with the usage data as the final chunk if you set `stream_options: {"include_usage": true}` in the original request
{
"id": "chatcmpl-3LFz2VTgjsVxv5kPI3K3e2MwJOFr6V2c",
"object": "chat.completion",
"created": 1693600060,
"model": "gpt-4-turbo",
// NEW: empty choices list in the last chunk
"choices": [],
"usage": {
"prompt_tokens": 6,
"completion_tokens": 10,
"total_tokens": 16
}
}