I would like to create a platform where users can setup an OpenAi Assistant and then use it to create threads.
I would like to retrieve the token consumption of each API call that the user makes so I can charge him/her accordingly.
I don’t seem to find any mention of this anywhere. Looks like it’s only available when using Completions.
I’ve looked at the headers of my requests, no field concerning token consumption.
Do you guys know how I can get this information through API calls? Thank you
You could use a platform like langfuse or for just the tokens of text you can use the tokenizer from openai.
So I would need to make an API call for each API call I make? Is there not a way for each API call to return the token consumption as well ?
I am not sure about the exact use of both. But the Tokenizer is just a lokal programm you can call without api. And as i understand it, langfuse is like a api layer between you and OpenAI.
Using langfuse seems like a lot of work only to retrieve a token amount. The problem is that OpenAI assistants are still on Beta so not finished. But no way if they’re planning on returning the token consumption…
Short answer is that you can’t really predict the token usage at the moment, particularly when using functions or the available tools (retrieval, code interpreter).
This has been widely requested but as far as we know, there has been no improvement on the Assistants API since its initial beta launch on November.
Gracias for sharing your results Luis, that adds up and is in line with what we’ve seen. The instructions are of course counted in any new thread.
What do you mean by old and new threads though?
This is a common question that the community still hasn’t found an answer to. At the moment, it is abstracted and you can only estimate it, if you’re interested on that I have a couple recommendations here:
Also, the tool itself has been widely replicated. I’d encourage you to build something that goes beyond a UI for the assistant API.
Best of luck,
No way for getting it through API calls
The way I’ve been doing it is just counting tokens and simulating the completion calls lifecycle, and it’s close enough
Thanks Jorge, I think I’ll wait till they improve this part. Thanks for your help anyway
Assuming you are following the steps outlined
here on how to use the Assistants API, the following method should work for you!
assistants endpoint, you can access the token count via the
After you’ve create a thread, added a message to the thread, and run the assistant, you must wait for the run to finish. The run data includes the
usage (prompt tokens, completion tokens & total tokens) in the output.
The output will look like this:
This data is available via the List runs, List run steps, Retrieve run, Retrieve run step, and Modify run endpoints.
You can create a function that adds each individual API calls token count by
run or even add up the total of each
run to get a total (conversation) count.
You can see what data is available from the