Open AI Assistants : how to get the token count?

Hi there,

I would like to create a platform where users can setup an OpenAi Assistant and then use it to create threads.

I would like to retrieve the token consumption of each API call that the user makes so I can charge him/her accordingly.

I don’t seem to find any mention of this anywhere. Looks like it’s only available when using Completions.

I’ve looked at the headers of my requests, no field concerning token consumption.

Do you guys know how I can get this information through API calls? Thank you

You could use a platform like langfuse or for just the tokens of text you can use the tokenizer from openai.

So I would need to make an API call for each API call I make? Is there not a way for each API call to return the token consumption as well ?

I am not sure about the exact use of both. But the Tokenizer is just a lokal programm you can call without api. And as i understand it, langfuse is like a api layer between you and OpenAI.

Using langfuse seems like a lot of work only to retrieve a token amount. The problem is that OpenAI assistants are still on Beta so not finished. But no way if they’re planning on returning the token consumption…

Short answer is that you can’t really predict the token usage at the moment, particularly when using functions or the available tools (retrieval, code interpreter).

This has been widely requested but as far as we know, there has been no improvement on the Assistants API since its initial beta launch on November.

Also, the tool itself has been widely replicated. I’d encourage you to build something that goes beyond a UI for the assistant API.

Best of luck,


No way for getting it through API calls
The way I’ve been doing it is just counting tokens and simulating the completion calls lifecycle, and it’s close enough

Thanks Jorge, I think I’ll wait till they improve this part. Thanks for your help anyway :wink:

1 Like

Assuming you are following the steps outlined here on how to use the Assistants API, the following method should work for you!

Using the assistants endpoint, you can access the token count via the runs object.

After you’ve create a thread, added a message to the thread, and run the assistant, you must wait for the run to finish. The run data includes the usage (prompt tokens, completion tokens & total tokens) in the output.

The output will look like this:

"usage": {
     "prompt_tokens": 123,
     "completion_tokens": 456,
     "total_tokens": 579

This data is available via the List runs, List run steps, Retrieve run, Retrieve run step, and Modify run endpoints.

You can create a function that adds each individual API calls token count by run or even add up the total of each run to get a total (conversation) count.

You can see what data is available from the run object here.

Good luck!