Assistant API tokens usage

YarinDay · November 27, 2023, 8:07am

There is not usage when retrieving run/message/assistant using AssistantAPI, not like ChatCompletion.
Wanted to know maybe there is a way to get the cost/usage or you are working on that.

Thank you very much!

_j · November 27, 2023, 9:01am

There is no method to see how much you’ve been charged per run.

The only report that you get is daily, by model, not exclusive to assistants.

Hiding how much this actually costs in the usage page was rolled out alongside assistants.

That should be the first warning.

loicthomas · December 6, 2023, 2:15pm

+1

It would be beneficial to have usage metrics per assistant, per thread, per message, and per run request, similar to the call reply in the completion API. My objective is to divide usage per assistant and prevent the sending of messages once a specified usage limit (in tokens) is reached. This is a crucial API feature for many of us, and it should ideally be implemented already.

Are you currently working on this feature?

Thank you.

frknbasaran · December 6, 2023, 8:21pm

+1.

It would be helpful to see usage per assistant, even if not by thread or run.

BoldChicken · December 6, 2023, 8:59pm

For any developers currently building an app using the API and looking forward to bring this app to the market, there is no way we can actually charge our customers based on usage if we are unable to calculate the costs per assistant or per thread.

It is a big blocker in terms of building actual commercial solutions for customers.

team13 · December 19, 2023, 5:51pm

We love the new assistants API, but being able to meter usage is a crucial feature we’d need for us to be able to adopt it.

alaaae2000 · November 13, 2024, 11:42pm

I was thinking about trying to calculate the charge myself after each run using the price of input/output defined for the model, won’t that be accurate?

_j · November 13, 2024, 11:52pm

You could calculate monetary costs, knowing the model of the Assistant ID and its input and output pricing (divided by 1000 or 1M).

Since the time of this topic’s creation and past obfuscated costs of using assistants, you now get back usage billing information in the run response:

{
  "id": "run_abc123",
  "object": "thread.run",
  "created_at": 1698107661,
  "assistant_id": "asst_abc123",
  "thread_id": "thread_abc123",
  "status": "completed",
    ...
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 456,
    "total_tokens": 579
  },
...

The actual token costs of using Assistants will overflow a 16 bit unsigned int…

alaaae2000 · November 14, 2024, 12:02am

the prices Pricing | OpenAI are the one used for assistants ?
lets say a user in chatting with

gpt-3.5-turbo-1106
input $1.00 / 1M tokens
output $2.00 / 1M tokens

after a run I will do the math on

terminal {
  prompt_tokens: 1288,
  completion_tokens: 527,
  total_tokens: 1815,
  prompt_token_details: { cached_tokens: 0 }
}

assuming prompt_tokens=input and completion_tokens=output
and if the user uses ( hypothetically ) 1M prompt_tokens and 1M completion_tokens, I should deduct 3$ of his balance ?
or there are some more computations that I am not aware of and if do this ; I will end up broke

_j · November 14, 2024, 1:09am

The request can fail and you don’t get the report or an accurate report.

This can happen if you hit a model rate limit when assistants continues calling internally or just some general malfunction where the model got used but the call wasn’t received and accounted for properly by the backend.

So besides the value add that you are actually doing, you should allow some buffer for failures and estimate them. That is especially important when streaming chat completions, as the call can time out on you despite the model having been run, or the o1 model may give you a content policy error despite thinking about it on your bill. Even when you cancel a streaming API call, then you have to use a token counter on the input and what you partially received to make a guess, because the usage comes last.

If you are using assistants, there are additional completely unpredictable unreturned costs - the cost of $0.03 code interpreter sessions per thread that time out when they want after an hour or so, vector stores that are based on per-day per GB charges that may directly involve user-uploaded files that expire after 7 days or not with no usage report…

The chance that you make similar thread calls in close proximity and have a cache discount that you subtract from prompt at 50% can get calculation but can’t be known ahead of time.

Topic		Replies	Views
Surface cost for a thread run in the Assistants API API	9	1538	December 1, 2023
Do Assistant-called function outputs count towards input tokens? API	8	2110	January 12, 2024
Open AI Assistants : how to get the token count? API api , assistants-api , assistants-pricing	16	16981	July 23, 2024
Assistant API token Usage - promt_tokens usage is too high API api-usage , assistants , assistants-api	8	1967	April 10, 2024
Token Usage When Calling Assistants through the API API token , api-usage , assistants-api	2	136	April 10, 2025

Assistant API tokens usage

Related topics