I agree I would like to have the usage in the response to stream requests (at least in one of the events in the stream), similar to usage for non stream requests.
However, a solution for the meanwhile -
The number of prompt tokens can be calculated offline using tiktoken in python for example (This is a guide I used)
The number of the events in the stream should represent the number of the tokens in the response, so just count them while iterating,
By adding 1 and 2 together you get the total number of tokens for the request.
I want to stress out that I would still like to receive the usage in the response to the request and not have to compute it myself.
I noticed that on the usage page, you can see the number of requests and token usage per period, so is there any official API that can query the token usage of this conversation through “id”? “id” exists in both stream requests and normal requests. (“id”: “chatcmpl-74pW6*********************Wdi”)
However, after extensive testing, I found that the token value calculated by the calculator for offline token calculation is far from the actual value used. So Python’s tiktoken is not reliable.
However, please find attached the method I am currently using to calculate tokens:
Each stream containing an answer is treated as a token, and when all these streams are added up, they are equal to all the tokens in this question. This is the method for calculating the response token.
The token method used for questioning can be implemented using tiktoken (which is really impossible). Ha ha ha
Can you clarify that? Question and Answer? Are you saying that each chunk that is returned in the answer (response) is equal to both the question (request) and answer (response). So the entire conversation token usage turn (question and answer) is basically the number of chunks returned in the stream?
I think that’s what you mean. I don’t quite understand English. I am Chinese and I use translation software to translate your language, so there may be some discrepancies in the translation. sorry
fwiw: an ideal place for it to be picked up would be when we receive the [DONE] message… Would like this too, would be nice to avoid using an instance of tiktoken to do all of this.
Bumping this thread as this is a major hole in the current API. Specifically, streaming responses should include a usage object, either as a cumulative sum or alternatively alongside the final "finish_reason"="stop" chunk
Counting the number of chunks returned is not a valid workaround because (a) we have no explicit guarantee that each chunk is exactly equal to one token and (b) it can’t answer the number of prompt_tokens used in the completion request, even though we are billed for them
Since it’s not even stated that chunks will be always at token breaks:
import tiktoken
class Tokenizer:
def __init__(self, encoder="cl100k_base"):
self.tokenizer = tiktoken.get_encoding(encoder)
def tokens(self, text):
return len(self.tokenizer.encode(text))
count = Tokenizer()
# assemble AI `reply` as you would need to do to add to chat history
tokens = count.tokens(reply)
A clever person could even calculate a function_call return by putting it back in the emitted AI language.
Sure, that’s a “workaround”, but not a solution.
It (a) assumes tiktoken perfectly calculates the token in the API and (b) forces developers to add another dependency to their project, particularly when the only official version is the Python package (JS users will have to rely on their own choice of a third-party fork, which can be problematic for a few reasons).
I’d link to issues 22 and 97 on the github repo but don’t have the rep to add links…
It works when we use
result = await api.ChatEndpoint.GetCompletionAsync(chatRequest);
Unfortunately when we stream Usage is null.
result = await api.ChatEndpoint.StreamCompletionAsync(chatRequest, partialResponse =>
{
txtinfo = txtinfo + partialResponse.FirstChoice.Delta;
});
I have similar problem. To keep word “Processing” for at least 20s and keep track about usage or have responsible application without any clue about costs. I think that Usage info on official website has some delay.