Assistant using too many tokens

ikonodim · April 28, 2024, 2:49pm

Hello everybody. I am trying a couple of things with the help of the assistants api but there are some small issues. I gave the assistant the insturction: “Address the user as ‘Jonny Depp’” while creating it. The prompt I gave it was “Whats your name?” and it responeded with “Hello, Jhonny Depp! My name is Assistant. How can I assist you today?”. I checked the usage from the response at it said that it used 47 total tokens. I calculated them myself and it should be about 34 tokens. Why is that?

_j · April 28, 2024, 2:55pm

Assistants can use a lot more tokens than just the input when they have tools enabled.

For this simple input though, it is the chat format itself that messages are placed into that is the source of the overhead. The message is enclosed in a container of additional tokens with the role, and the AI also must be prompted with more tokens to produce its response.

Overhead above the content tokens:

7 tokens: first message
4 tokens: additional messages

In some cases you may be billed for an extra unseen output token or two that possibly is the AI communicating to the endpoint backend whether to invoke tools.

ikonodim · April 28, 2024, 2:58pm

Ok. Thank u so much. However, I still have got two question about the assistants api. When I provide the instruction to the assistant directly when creating / modifying it, will I be billed for the tokens that it used? Second: When using the file tool/vector storage and the ai takes a chunk of a file, will I be billed for these tokens too? Thanks!

_j · April 28, 2024, 3:22pm

You are billed for what actually goes into and out of a language model at run time.

So: API calls for setting up and changing assistants and files are of no cost (unless you create a vector store over 1GB and are billed for storage).

It is when a run is invoked to answer the latest user input that everything compiled as input context is sent to the AI model you’ve chosen, and the response is formed.

The vector storage is presented to the AI as a search tool. The tool itself takes tokens to give the instructions for usage. If the AI makes a search instead of replying to you, that is two AI model calls, the second model call then having a large amount of documentation retrieved as another part of the input.

The chat within a thread also grows, consuming more tokens the longer the AI “memory” of chat history extends on following calls.

Topic		Replies	Views
Token Optimization for Assistants API - Excesive token count API gpt-4 , assistants , assistants-api	2	3094	May 24, 2024
Pricing of Assistant API misleading API	1	2098	December 11, 2023
Clarification on token calculations with OpenAi Assistants API assistants-api	1	236	September 25, 2024
Assistants API Pricing Using GPT-4 API assistants , assistants-api , assistants-pricing	1	7911	December 27, 2023
Does `context token` including the uploaded file in Assistant messages? API assistants-pricing	5	1374	January 12, 2026

Assistant using too many tokens

Related topics