How to calculate token usage using stream=True?

1.When I use api interface and adopt stream=True, if the transmission is interrupted halfway due to network problems or something, how can I calculate the token length of my inquiry?
2.Official website, when a response was cut off in the middle, how did you achieve the goal of answering the current question?

1 Like

Hi and welcome to the developer forum!

If you use tiktoken to count the reply you have so far and add an additional 10 tokens that is fairly accurate, you should experiment with stopping the streamed chat and using tick token and the log in your account to find a value that works for your use case, mine was 10.

I don’t understand question 2, can you rephrase it?

1 Like

Thank you for your answer! !
That is, when I use streaming, is there no charge for the content behind me when the network is interrupted? I just need to calculate the number of token in front and add 10.

The second question is: openai official website can continue to respond to the current message. How can it be realized through the api interface?
I can read this statement, understand?

To stop the current stream you break the connection to the socket handling it, i.e. close the connection.

Thank you for your answer.
Then I use stream=True to call api interface. How do I disconnect the socket?

Excuse me, when using tiktoken to calculate token, where did you see your plus 10 after calculation?

That was just the result of my testing, I performed some tests and stopped the stream, then I counted how many tick token counted, then I looked at my account usage and calculated the difference was an additional 10 tokens were counted by the OpenAI account system.

you should do your own tests to make sure this is the case for your code.

1 Like

Thanks for the answer!
So when I call the interface to stream data, how can I disconnect this socket to prevent it from continuing to charge?

You .close the object you created to perform the stream.

response = openai.Completion.create(
    for stream_resp in response:
        # Do stuff...
        if thing_happens:
except Exception as e:
    response.close()  <---- close connection, do this when you want to end the conenction.

Thank you for the method. This method is very useful.
After I close the socket, it won’t lead to subsequent billing, will it?
Then I can also do the calculation through tiktoken. I currently use a token.
I also need to ask you about my chat with official website on openai. He can continue to answer on the basis of a request. How can this be realized on the api? Do you have any ideas or thoughts that you can tell me?

Let me describe that function in detail: When I ask a question on official website, if the answer is limited for some reason, the answer is not finished. I can let it continue to answer this question.

On the question of what to do if the answer is incomplete, you should check the “finish_reason” if this is not “stop” but id instead “length” then you know to issue another API call appending the information so far and the command to “Truncated, please continue” and the model should carry on from where it left off.

Implementing this will be a technical challenge that I will leave as an exercise for you to complete.

1 Like

Thank you for your answer.
I think so, too, but this problem can be easily solved. This problem is that the maximum length token of the answer is added to the parameter when calling the api interface.
Then you need to associate the unanswered content with the contextual content, and then call the api to ask a new question for him to answer the unanswered question.
I think this is the solution. Do you think there is a problem?
But if there is an answer that exceeds the maximum token length of the model, it can’t be solved at present, right?
I’ll take a look at the correspondence of my token again. I use the official tiktoken package to calculate it accurately. So far, no deviation has been found. I need to calculate the statistical results several times.

Let me add your answer. When using asynchrony, you need to use the aclose method when closing.

1 Like

It is irrelevant how the count is carried out, the error is when using real-time transmission.
Today, after almost 2 months of testing, the consumption matches on my meter and on the OpenAI website.