Chat completion "stream" API token usage

Hello.

I’m currently using the chat completion “stream” API and need information on token usage.

I’m wondering if usage data for this stream API is planned to be provided, and if so, when it will be available.

There was a guide as below, and it’s fine for English, but the number of events and tokens are different for other languages (like Korean).

I’m wondering if there is a guide for token calculation for other languages. (Is there a more memory-efficient way than gathering all the responses and calculating with tiktoken?)

1 Like

Streaming of the response does not produce a count of the token usage. It only sends a “finish_reason” as the last delta piece.

There are no indications that the streaming method will be changed in the future. Changes could break almost every software in existence that relies on the current behavior.

The software for calculation of tokens is the same regardless of which language you are using to talk to the AI. You can receive the whole response into a appended string at the same time as you are displaying it. Then when finished, calculate the number of tokens in the complete response using the tiktoken library or similar.

Software manual: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb

The forum post you linked is wrong. There is no guarantee that the number of pieces that you receive relates to the number of tokens that are within.

2 Likes

OpenAI team here, we have now added a feature for this! See Usage stats now available when using streaming with the Chat Completions API or Completions API

1 Like