How do you get token count when streaming

Where do you get the token count from when you use the stream option for completions?

The system streams the result as blocks of text - but none of the responses have token counts attached to them

The final [DONE] doesn’t have the token count either

4 Likes

You might consider writjng a method which estimates the token count by counting the words received and using the documented OpenAI word-token estimate.

.

I have a tokenizer but it is not accurate enough with foreign languages where some words are represented by several tokens

Ie a different ratio to English

I have found the JavaScript function in the playground and tokenizer pages.

I guess I will have to reverse engineer the dictionary and regex expressions in the JavaScript given that openai don’t seem to pass the token count when they stream results

The publically available gpt 2 function uses a different dictionary

1 Like

Hi there! It seems like you’re looking for a way to calculate the token count when using the stream option for completions. I have good news for you! I’ve developed a library in C++ with C export for exactly this purpose, which can be used with various programming languages.

I’ve made this library available to the public, and I hope it will be helpful for others who need this functionality, as OpenAI doesn’t provide it by default. By using this library, you can save time and get the token count for the streamed blocks of text.

Please feel free to try it out and let me know if you have any questions or if I misunderstood your comment. I’m here to help!