OpenAi API - get usage tokens in response when set stream=True

In general, we can get tokens usage from response.usage.total_tokens, but when i set the parameter stream to True, for example:

def performRequestWithStreaming():
    openai.api_key = OPEN_AI_TOKEN
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": "What is Python?"}],
        stream=True,
        temperature=0)

    for r in response:
        print(r)

all responses is like this:

{
  "choices": [
    {
      "delta": {
        "content": "."
      },
      "finish_reason": null,
      "index": 0
    }
  ],
  "created": 1680676704,
  "id": "chatcmpl-71r4iJF8s8R7Uedb4FZO13U5CPdTr",
  "model": "gpt-3.5-turbo-0301",
  "object": "chat.completion.chunk"
}
{
  "choices": [
    {
      "delta": {},
      "finish_reason": "stop",
      "index": 0
    }
  ],
  "created": 1680676704,
  "id": "chatcmpl-71r4iJF8s8R7Uedb4FZO13U5CPdTr",
  "model": "gpt-3.5-turbo-0301",
  "object": "chat.completion.chunk"
}

No usage property now, how can i know how many tokens are used?

3 Likes

I agree I would like to have the usage in the response to stream requests (at least in one of the events in the stream), similar to usage for non stream requests.

However, a solution for the meanwhile -

  1. The number of prompt tokens can be calculated offline using tiktoken in python for example (This is a guide I used)
  2. The number of the events in the stream should represent the number of the tokens in the response, so just count them while iterating,
  3. By adding 1 and 2 together you get the total number of tokens for the request.

I want to stress out that I would still like to receive the usage in the response to the request and not have to compute it myself.

1 Like

I noticed that on the usage page, you can see the number of requests and token usage per period, so is there any official API that can query the token usage of this conversation through “id”? “id” exists in both stream requests and normal requests. (“id”: “chatcmpl-74pW6*********************Wdi”)

However, after extensive testing, I found that the token value calculated by the calculator for offline token calculation is far from the actual value used. So Python’s tiktoken is not reliable.
However, please find attached the method I am currently using to calculate tokens:

  1. Each stream containing an answer is treated as a token, and when all these streams are added up, they are equal to all the tokens in this question. This is the method for calculating the response token.
  2. The token method used for questioning can be implemented using tiktoken (which is really impossible). Ha ha ha
1 Like

Can you clarify that? Question and Answer? Are you saying that each chunk that is returned in the answer (response) is equal to both the question (request) and answer (response). So the entire conversation token usage turn (question and answer) is basically the number of chunks returned in the stream?

I think that’s what you mean. I don’t quite understand English. I am Chinese and I use translation software to translate your language, so there may be some discrepancies in the translation. sorry

fwiw: an ideal place for it to be picked up would be when we receive the [DONE] message… Would like this too, would be nice to avoid using an instance of tiktoken to do all of this.

I’m not using stream=true currently, but isn’t each chunk a single token… so you can count # of chunks and have # of tokens?

That’s just in the reply, the # of tokens consumed includes what you send, and parts of the data structure that makes up the messages array.

This package has a rubrik for figuring it out using a node implementation of tiktoken:

Still, would be nice to hear from the source what the total consumed was for the individual request.

1 Like