How to calculate token usage using stream=True?

zhangth820 · August 11, 2023, 7:32am

1.When I use api interface and adopt stream=True, if the transmission is interrupted halfway due to network problems or something, how can I calculate the token length of my inquiry?
2.Official website, when a response was cut off in the middle, how did you achieve the goal of answering the current question?

Foxalabs · August 11, 2023, 7:40am

Hi and welcome to the developer forum!

If you use tiktoken to count the reply you have so far and add an additional 10 tokens that is fairly accurate, you should experiment with stopping the streamed chat and using tick token and the log in your account to find a value that works for your use case, mine was 10.

I don’t understand question 2, can you rephrase it?

zhangth820 · August 11, 2023, 7:45am

Thank you for your answer! !
That is, when I use streaming, is there no charge for the content behind me when the network is interrupted? I just need to calculate the number of token in front and add 10.

The second question is: openai official website can continue to respond to the current message. How can it be realized through the api interface?
I can read this statement, understand?

Foxalabs · August 11, 2023, 7:47am

To stop the current stream you break the connection to the socket handling it, i.e. close the connection.

zhangth820 · August 11, 2023, 7:50am

Thank you for your answer.
Then I use stream=True to call api interface. How do I disconnect the socket?

zhangth820 · August 11, 2023, 7:53am

Excuse me, when using tiktoken to calculate token, where did you see your plus 10 after calculation?

Foxalabs · August 11, 2023, 7:56am

That was just the result of my testing, I performed some tests and stopped the stream, then I counted how many tick token counted, then I looked at my account usage and calculated the difference was an additional 10 tokens were counted by the OpenAI account system.

you should do your own tests to make sure this is the case for your code.

zhangth820 · August 11, 2023, 7:58am

Thanks for the answer!
So when I call the interface to stream data, how can I disconnect this socket to prevent it from continuing to charge?

Foxalabs · August 11, 2023, 8:08am

You .close the object you created to perform the stream.

response = openai.Completion.create(
    stream=True,
)
try:
    for stream_resp in response:
        # Do stuff...
        if thing_happens:
          break
except Exception as e:
    print(e)
finally:
    response.close()  <---- close connection, do this when you want to end the conenction.

zhangth820 · August 11, 2023, 8:18am

Thank you for the method. This method is very useful.
After I close the socket, it won’t lead to subsequent billing, will it?
Then I can also do the calculation through tiktoken. I currently use a token.
I also need to ask you about my chat with official website on openai. He can continue to answer on the basis of a request. How can this be realized on the api? Do you have any ideas or thoughts that you can tell me?

Let me describe that function in detail: When I ask a question on official website, if the answer is limited for some reason, the answer is not finished. I can let it continue to answer this question.

Foxalabs · August 11, 2023, 8:32am

On the question of what to do if the answer is incomplete, you should check the “finish_reason” if this is not “stop” but id instead “length” then you know to issue another API call appending the information so far and the command to “Truncated, please continue” and the model should carry on from where it left off.

Implementing this will be a technical challenge that I will leave as an exercise for you to complete.

zhangth820 · August 11, 2023, 8:41am

Thank you for your answer.
I think so, too, but this problem can be easily solved. This problem is that the maximum length token of the answer is added to the parameter when calling the api interface.
Then you need to associate the unanswered content with the contextual content, and then call the api to ask a new question for him to answer the unanswered question.
I think this is the solution. Do you think there is a problem?
But if there is an answer that exceeds the maximum token length of the model, it can’t be solved at present, right?
I’ll take a look at the correspondence of my token again. I use the official tiktoken package to calculate it accurately. So far, no deviation has been found. I need to calculate the statistical results several times.

zhangth820 · August 11, 2023, 8:53am

Let me add your answer. When using asynchrony, you need to use the aclose method when closing.

AteneaIA · December 28, 2023, 8:54am

It is irrelevant how the count is carried out, the error is when using real-time transmission.
Today, after almost 2 months of testing, the consumption matches on my meter and on the OpenAI website.

Topic		Replies	Views
Chat completion "stream" API token usage API api	2	6796	May 6, 2024
How to get token usage for each API call in streaming model? API	8	8749	July 6, 2023
Incomplete Words in Streaming API	3	1366	January 29, 2024
Usage Info in API Responses Announcements	19	13207	September 27, 2023
Interrupting completion stream in Python API	12	14664	December 14, 2023

How to calculate token usage using stream=True?

Related topics