Open AI Assistants : how to get the token count?

Hi there,

I would like to create a platform where users can setup an OpenAi Assistant and then use it to create threads.

I would like to retrieve the token consumption of each API call that the user makes so I can charge him/her accordingly.

I don’t seem to find any mention of this anywhere. Looks like it’s only available when using Completions.

I’ve looked at the headers of my requests, no field concerning token consumption.

Do you guys know how I can get this information through API calls? Thank you

2 Likes

You could use a platform like langfuse or for just the tokens of text you can use the tokenizer from openai.

So I would need to make an API call for each API call I make? Is there not a way for each API call to return the token consumption as well ?

I am not sure about the exact use of both. But the Tokenizer is just a lokal programm you can call without api. And as i understand it, langfuse is like a api layer between you and OpenAI.

Using langfuse seems like a lot of work only to retrieve a token amount. The problem is that OpenAI assistants are still on Beta so not finished. But no way if they’re planning on returning the token consumption…

1 Like

Short answer is that you can’t really predict the token usage at the moment, particularly when using functions or the available tools (retrieval, code interpreter).

This has been widely requested but as far as we know, there has been no improvement on the Assistants API since its initial beta launch on November.

Also, the tool itself has been widely replicated. I’d encourage you to build something that goes beyond a UI for the assistant API.

Best of luck,

2 Likes

No way for getting it through API calls
The way I’ve been doing it is just counting tokens and simulating the completion calls lifecycle, and it’s close enough

1 Like

Thanks Jorge, I think I’ll wait till they improve this part. Thanks for your help anyway :wink:

1 Like

Assuming you are following the steps outlined here on how to use the Assistants API, the following method should work for you!

Using the assistants endpoint, you can access the token count via the runs object.

After you’ve create a thread, added a message to the thread, and run the assistant, you must wait for the run to finish. The run data includes the usage (prompt tokens, completion tokens & total tokens) in the output.

The output will look like this:

"usage": {
     "prompt_tokens": 123,
     "completion_tokens": 456,
     "total_tokens": 579
}

This data is available via the List runs, List run steps, Retrieve run, Retrieve run step, and Modify run endpoints.

You can create a function that adds each individual API calls token count by run or even add up the total of each run to get a total (conversation) count.

You can see what data is available from the run object here.

Good luck!

6 Likes

if (Run.status === “completed”) {
console.log(“\n”);
console.log(`terminal’,Run.usage.total_tokens)
}

1 Like

Hi @jorgeintegrait , sorry to bother you, but I wanted to know if there has been any improvement yet. I’ve been working with an assistant and using retrieval as a tool, and when I use run.usage , the number of tokens that I get is not the same as mentioned on the API. Even for the Number of requests, if I use the assistant once, the number of requests on the API is about 16. Thank you in advance.

The important number is the token count, not so much the API calls, as OpenAI doesn’t charge per call.

Take a look at the usage screen and do a very slow test (their usage screen sometimes takes a while to update) and compare those results with the usage results from the API.
i.e.

  1. Have no usage for 10 minutes.
  2. Send one message to your assistant
  3. Measure usage from API.
  4. Wait ~5-10 minutes. Analyze the usage from https://platform.openai.com/usage

If your numbers don’t match up, you can report it here.

Hello @jorgeintegrait , I followed the steps you mentioned but the numbers obtained from the run.usage.prompt_tokens and run.usage.completion_tokens metrics are 48566 and 1116, respectively. However, the API reports different figures for context tokens (57477) and generated tokens (1189).I don’t quite understand where the problem is, or perhaps I’m using a functionality incorrectly.

This adds up with my testing, the token reporting of the API seems unconsistent at the moment.

What I imagine is that it is couting some output tokens as context tokens again and that makes the difference between what the API sees and the back end reports. It is also possible that they even yse different calculations.

At the moment, it seems more of a guide than a definite cost estimation. Nonetheless, you can at least use this information on your project, and estimate that the context cost could be up to 20% higher than the token count returned fomr the API.

If you provide details of timestamps (when you made your requests) it is possible that someone fmor OpenAI can add that information to the issue report and help fix it, as I imagine this is an issue already known to them.

In any case, best of luck! and sorry that there isn’t a perfect answer atm.

1 Like

If I call assistant in the streaming mode, like :with client.beta.threads.runs.stream method. how to retrieve the token in the the run object?

You could make a call to get the run by id once the stream finishes

You can get this using : stream.get_final_run().usage