OpenAi API - get usage tokens in response when set stream=True

songofhawk · April 5, 2023, 6:48am

In general, we can get tokens usage from response.usage.total_tokens, but when i set the parameter stream to True, for example:

def performRequestWithStreaming():
    openai.api_key = OPEN_AI_TOKEN
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": "What is Python?"}],
        stream=True,
        temperature=0)

    for r in response:
        print(r)

all responses is like this:

{
  "choices": [
    {
      "delta": {
        "content": "."
      },
      "finish_reason": null,
      "index": 0
    }
  ],
  "created": 1680676704,
  "id": "chatcmpl-71r4iJF8s8R7Uedb4FZO13U5CPdTr",
  "model": "gpt-3.5-turbo-0301",
  "object": "chat.completion.chunk"
}
{
  "choices": [
    {
      "delta": {},
      "finish_reason": "stop",
      "index": 0
    }
  ],
  "created": 1680676704,
  "id": "chatcmpl-71r4iJF8s8R7Uedb4FZO13U5CPdTr",
  "model": "gpt-3.5-turbo-0301",
  "object": "chat.completion.chunk"
}

No usage property now, how can i know how many tokens are used?

yuvalapidot · April 8, 2023, 6:47pm

I agree I would like to have the usage in the response to stream requests (at least in one of the events in the stream), similar to usage for non stream requests.

However, a solution for the meanwhile -

The number of prompt tokens can be calculated offline using tiktoken in python for example (This is a guide I used)
The number of the events in the stream should represent the number of the tokens in the response, so just count them while iterating,
By adding 1 and 2 together you get the total number of tokens for the request.

I want to stress out that I would still like to receive the usage in the response to the request and not have to compute it myself.

Slightwind · April 13, 2023, 1:13pm

I noticed that on the usage page, you can see the number of requests and token usage per period, so is there any official API that can query the token usage of this conversation through “id”? “id” exists in both stream requests and normal requests. (“id”: “chatcmpl-74pW6*********************Wdi”)

419649693 · April 14, 2023, 2:54am

However, after extensive testing, I found that the token value calculated by the calculator for offline token calculation is far from the actual value used. So Python’s tiktoken is not reliable.
However, please find attached the method I am currently using to calculate tokens:

Each stream containing an answer is treated as a token, and when all these streams are added up, they are equal to all the tokens in this question. This is the method for calculating the response token.
The token method used for questioning can be implemented using tiktoken (which is really impossible). Ha ha ha

prescottc · April 16, 2023, 4:26am

Can you clarify that? Question and Answer? Are you saying that each chunk that is returned in the answer (response) is equal to both the question (request) and answer (response). So the entire conversation token usage turn (question and answer) is basically the number of chunks returned in the stream?

419649693 · April 17, 2023, 7:25am

I think that’s what you mean. I don’t quite understand English. I am Chinese and I use translation software to translate your language, so there may be some discrepancies in the translation. sorry

arthur.goldsmith · May 7, 2023, 12:37am

fwiw: an ideal place for it to be picked up would be when we receive the [DONE] message… Would like this too, would be nice to avoid using an instance of tiktoken to do all of this.

PaulBellow · May 7, 2023, 1:11am

I’m not using stream=true currently, but isn’t each chunk a single token… so you can count # of chunks and have # of tokens?

arthur.goldsmith · May 7, 2023, 1:20am

That’s just in the reply, the # of tokens consumed includes what you send, and parts of the data structure that makes up the messages array.

This package has a rubrik for figuring it out using a node implementation of tiktoken:

github.com

Cainier/gpt-tokens/blob/main/index.js#L166


      
          }
          if (model === 'gpt-3.5-turbo-0301') {
              tokens_per_message = 4;
              tokens_per_name = -1;
          }
          if (['gpt-4-0314', 'gpt-4-32k-0314'].includes(model)) {
              tokens_per_message = 3;
              tokens_per_name = 1;
          }
          // Python 2 Typescript by gpt-4
          for (const message of messages) {
              num_tokens += tokens_per_message;
              for (const [key, value] of Object.entries(message)) {
                  num_tokens += encoding.encode(value).length;
                  if (key === 'name') {
                      num_tokens += tokens_per_name;
                  }
              }
          }
          // Supplementary
          encoding.free();

Still, would be nice to hear from the source what the total consumed was for the individual request.

cluster · July 3, 2023, 8:34am

You can count only completion tokens using this way, not request tokens.

oliver-proudfoot · July 26, 2023, 3:41pm

Looking up the ID would be good or making a tokenizer endpoint as well would be good.

doem1997 · August 9, 2023, 5:27pm

Uh anyone noticed that in official API doc, they showed usage field in response chunk:

  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }

However, I am so sure I didn’t get one. Even in the final “stop” chunk. Anyone help?

riskcomplex · August 15, 2023, 6:12pm

I just checked all the chunks I get back and you’re right, none of them have the usage field, even though the documentation does. Weird.

jakeduth · August 24, 2023, 3:24pm

That’s not in the chunk object or streaming example:

lachlan1 · September 1, 2023, 5:16am

This can’t be right.

So we are purchasing tokens for a price per token, but we’re not allowed to know how many tokens a request has used?

This must be a bug.

airstrike · October 18, 2023, 11:08pm

Bumping this thread as this is a major hole in the current API. Specifically, streaming responses should include a usage object, either as a cumulative sum or alternatively alongside the final "finish_reason"="stop" chunk

Counting the number of chunks returned is not a valid workaround because (a) we have no explicit guarantee that each chunk is exactly equal to one token and (b) it can’t answer the number of prompt_tokens used in the completion request, even though we are billed for them

Foxalabs · October 18, 2023, 11:34pm

Well, you can just run tiktoken on each delta chunk and sum the results.

_j · October 18, 2023, 11:50pm

Since it’s not even stated that chunks will be always at token breaks:

import tiktoken

class Tokenizer:
    def __init__(self, encoder="cl100k_base"):
        self.tokenizer = tiktoken.get_encoding(encoder)

    def tokens(self, text):  
        return len(self.tokenizer.encode(text))

count = Tokenizer()

# assemble AI `reply` as you would need to do to add to chat history
tokens = count.tokens(reply)

A clever person could even calculate a function_call return by putting it back in the emitted AI language.

airstrike · October 18, 2023, 11:56pm

Sure, that’s a “workaround”, but not a solution.

It (a) assumes tiktoken perfectly calculates the token in the API and (b) forces developers to add another dependency to their project, particularly when the only official version is the Python package (JS users will have to rely on their own choice of a third-party fork, which can be problematic for a few reasons).

I’d link to issues 22 and 97 on the github repo but don’t have the rep to add links…

mymetricsoftware · December 12, 2023, 11:18am

It works when we use
result = await api.ChatEndpoint.GetCompletionAsync(chatRequest);

Unfortunately when we stream Usage is null.
result = await api.ChatEndpoint.StreamCompletionAsync(chatRequest, partialResponse =>
{
txtinfo = txtinfo + partialResponse.FirstChoice.Delta;
});

I have similar problem. To keep word “Processing” for at least 20s and keep track about usage or have responsible application without any clue about costs. I think that Usage info on official website has some delay.

Topic		Replies	Views
Why there is no USAGE object returned with Streaming Api Call? API api , chat-completion , completions	20	5389	February 20, 2025
Usage stats now available when using streaming with the Chat Completions API or Completions API API api , api-usage , streaming	25	18778	January 23, 2025
Usage Info in API Responses Announcements	20	11842	September 27, 2023
Token usage calculation with streaming responses - is this not supported? Feedback	1	44	June 25, 2025
Issue with Token Usage in Streaming Responses Bugs api	17	1176	February 21, 2025

OpenAi API - get usage tokens in response when set stream=True

Related topics