I’m wondering if there is a guide for token calculation for other languages. (Is there a more memory-efficient way than gathering all the responses and calculating with tiktoken?)
Streaming of the response does not produce a count of the token usage. It only sends a “finish_reason” as the last delta piece.
There are no indications that the streaming method will be changed in the future. Changes could break almost every software in existence that relies on the current behavior.
The software for calculation of tokens is the same regardless of which language you are using to talk to the AI. You can receive the whole response into a appended string at the same time as you are displaying it. Then when finished, calculate the number of tokens in the complete response using the tiktoken library or similar.