I am not sure about the exact use of both. But the Tokenizer is just a lokal programm you can call without api. And as i understand it, langfuse is like a api layer between you and OpenAI.
Using langfuse seems like a lot of work only to retrieve a token amount. The problem is that OpenAI assistants are still on Beta so not finished. But no way if they’re planning on returning the token consumption…
Short answer is that you can’t really predict the token usage at the moment, particularly when using functions or the available tools (retrieval, code interpreter).
This has been widely requested but as far as we know, there has been no improvement on the Assistants API since its initial beta launch on November.
Also, the tool itself has been widely replicated. I’d encourage you to build something that goes beyond a UI for the assistant API.
No way for getting it through API calls
The way I’ve been doing it is just counting tokens and simulating the completion calls lifecycle, and it’s close enough
Assuming you are following the steps outlined here on how to use the Assistants API, the following method should work for you!
Using the assistants endpoint, you can access the token count via the runs object.
After you’ve create a thread, added a message to the thread, and run the assistant, you must wait for the run to finish. The run data includes the usage (prompt tokens, completion tokens & total tokens) in the output.
This data is available via the List runs, List run steps, Retrieve run, Retrieve run step, and Modify run endpoints.
You can create a function that adds each individual API calls token count by run or even add up the total of each run to get a total (conversation) count.
You can see what data is available from the run object here.
Hi @jorgeintegrait , sorry to bother you, but I wanted to know if there has been any improvement yet. I’ve been working with an assistant and using retrieval as a tool, and when I use run.usage , the number of tokens that I get is not the same as mentioned on the API. Even for the Number of requests, if I use the assistant once, the number of requests on the API is about 16. Thank you in advance.
The important number is the token count, not so much the API calls, as OpenAI doesn’t charge per call.
Take a look at the usage screen and do a very slow test (their usage screen sometimes takes a while to update) and compare those results with the usage results from the API.
i.e.
Hello @jorgeintegrait , I followed the steps you mentioned but the numbers obtained from the run.usage.prompt_tokens and run.usage.completion_tokens metrics are 48566 and 1116, respectively. However, the API reports different figures for context tokens (57477) and generated tokens (1189).I don’t quite understand where the problem is, or perhaps I’m using a functionality incorrectly.
This adds up with my testing, the token reporting of the API seems unconsistent at the moment.
What I imagine is that it is couting some output tokens as context tokens again and that makes the difference between what the API sees and the back end reports. It is also possible that they even yse different calculations.
At the moment, it seems more of a guide than a definite cost estimation. Nonetheless, you can at least use this information on your project, and estimate that the context cost could be up to 20% higher than the token count returned fomr the API.
If you provide details of timestamps (when you made your requests) it is possible that someone fmor OpenAI can add that information to the issue report and help fix it, as I imagine this is an issue already known to them.
In any case, best of luck! and sorry that there isn’t a perfect answer atm.