So there is no usage object returned when chat.completion api request made with streaming(chat.completion.chunch). Example usage object is added from regular request
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "gpt-3.5-turbo-0613",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "\n\nHello there, how may I assist you today?",
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}
What is the reason for that missing from streaming api calls? Or am I missing something or doing it wrong to not receive it?
Usage stats are not included when streaming, I think mostly down to the difficulty of knowing when a stream might be terminated from the other side, at what point do you send the usage stats ? You can use tiktoken to count the tokens in the response deltas.
Yes, I am aware of tiktoken and other token counter libs. We are using node.js so I wasn’t able to use tiktoken since it is not available in node.js but I used gpt-encoder lib instead. Thanks. I still think it will be useful if GPT returns at the end when stream is completed with finish_reason = stop
OpenAI really should consider adding a StreamGUID parameter where we can make a call that asks how much was consumed for any streaming attempt, up to like 1 or 2 minutes after the streaming completes. Using tiktoken, even if it works fine, is not ideal.
I also hit this problem in https://cocalc.com’s ChatGPT integration. Here’s the few lines of node.js code I used to handle streaming and compute the usage:
I used the gpt3-tokenizer library, which seems to be yet another library in addition to the ones mentioned above, for solving this problem: gpt3-tokenizer - npm
So I am aware of some 3rd party libs as well but I was hesitant using some of them even if they are openSource. I handle it with this one recommend in OpenAI cookbook or in tiktoken readme , which is gpt-3 encoder . My problem is that how accurate this libraries are and they will be up to date with gpt side changes if there is some. Thats why it will be alot better to use results returned from GPT itself, such as in regular completion api.
When we look at this table from tiktoken, we can see that gpt-3-encoder doesn’t even support cl100k_base encoding.
So I see that this is recommended by OpenAI in their token counting page now (scroll all the way to bottom). This is also equivalent to tiktoken in python and handles it with cl100k_base encoding with gpt3-5, gpt4 supports.
With chat completion, you can absolutely measure the input tokens and response you receive yourself, by use of a library such as tiktoken, and then adding token count metadata to accounts and to the chat history messages for utility. One only needs to add the fixed per call/per message overhead to the inputs sent, and function specification size can be measured by switching them off on a non-stream call. It is only occasional failures that still get billed you have to allow for.
With assistants, you absolutely have no idea what the agent has been up to until the daily charges start showing up on your account.
Quick update on this: we have a version working that we are testing but are not yet happy with the overall design (there’s a lot of complexity in streaming and billing together). Hopefully this is something we can land for you all soon. Stay tuned!
I’m half way through with a layer2 build from Hedera Hashgraphs’ SDK - java using a java2py lib to enable HBAR- a highest ABFT consensus security measure- a public ledger.
To enable my Customers’ Solar industry Tech helped chat app to be able to bill their customers in HBAR - $ terms layered ontop of my Customer’s OpenAI pkey account.
Tokenizing usage in effect with a margin.
When this is done I will have 20hrs a week coming available - in case you know of any AI projects that need help. ty.
last 4 months studied and using Langchain’s libs, and Langsmith to tune prompts and chains. Databutton is kinda pretty but not deep enough as a streamlit based chat Dev tool IDE…