How to count TOKEN in a thread with API

py_en · October 26, 2024, 8:06pm

Hi, i’m using these commands:

thread = client.beta.threads.create()

message = client.beta.threads.messages.create(
thread_id=thread.id,
role=“user”,
content=prompt)

run = client.beta.threads.runs.create(
thread_id=thread_id,
assistant_id=“aaaaaaaaaaa”,
tool_choice={‘type’:‘file_search’},
)

where I can add some code which allow me to know the token usage?
Thanks.

_j · October 26, 2024, 9:26pm

Unfortunately, while you can retrieve the messages in a thread that are the final product of an assistant, and messages in a thread that are user input, and use a token-counting module (tiktoken) to count the tokens of text of those messages, you cannot obtain a true token count.

Internal tool calls and tool responses, which may include large sections of file search documents and code, are NOT exposed to you.
The contents of a thread are NOT encoded as tokens, as they may be sent to models requiring different token encoders with different efficiencies.
The container format that messages are placed in are encodings specific to the “chat” format of a model.

You also cannot obtain a token cost of submitting.

OpenAI has a budgeting mechanism so that the token input capabilities of an AI model are not exceeded, but this is not presented to you or allowed to be directly controlled.
Enabling internal tool functions adds large text blocks of instructions to a run, and this is specific to an assistant chosen, along with that assistant’s other instructions.
Multiple internal tool calls can continue to add to a thread and be re-run automatically, where you pay multiple times for a growing thread.
The billing that might reveal the individual costs of calls to a model is obfuscated.

Therefore, Assistants is a platform where the billing is unpredictable, and the only control you are offered to limit the total expense will abort the process after you have been billed for a partial generation which may have no output to a user, on contents you cannot audit.

I hope this clarifies the advantages that using the Assistants endpoint offers you.

py_en · October 26, 2024, 9:34pm

Thank you.
It is clear, but it is unbelievable that you cannot know the costs through API, but only on the platform as a money cost.
I hope that openAi let us check the costs through API too in the future, otherwise, it is difficult to use this application for final customer.

_j · October 26, 2024, 9:50pm

You can obtain a base token cost of using one iteration of an assistant as it is configured, by running a thread containing “reply only ‘hello’” (if it does not make errors and invoke tools first instead).

However the amplification of that cost by growing conversation, internally-repeating model calls, and especially file search results that may add 10k-16k to a thread for each search, is unpredictable and is based on the user input.

Topic		Replies	Views
Open AI Assistants : how to get the token count? API api , assistants-api , assistants-pricing	16	16380	July 23, 2024
Thread token usage endpoint API chatgpt , api	3	1661	January 11, 2024
How can I calculate the context token in Assistant API? Community assistants , assistants-api	4	1981	March 20, 2024
Do Assistant-called function outputs count towards input tokens? API	8	2059	January 12, 2024
Assistant API tokens usage API api-usage , assistants-api	9	1839	November 14, 2024

How to count TOKEN in a thread with API

Related topics