High token consumption due to large instructions in Assistant API

joaquim.motger · February 10, 2025, 1:30pm

Hi all,

I’m starting to work with the Assistants API.

I want to create an assistant that acts as an annotator for a specific dataset. I have a document with around 3.5K words (~5K tokens) with very specific instructions on how to annotate the dataset.

I create the assistant using the ‘instructions’ field with a text-plain version of those guidelines. Then, I want to run N times the assistant by only sending the specific document that it must annotate (which, on average, is around ~60 tokens).

The thing is that, the first call to the assistant on the same thread seems to be consuming the tokens for the guidelines - which I understand. But then, every subsequent call, the API usage metadata seems to be accumulating all previous messages. This means that every request is consuming as input token the guidelines and ALL previous reviews.

What is the correct approach to do this? How can these instructions be computed only once, and then, focus only on consuming tokens for each new document?

joaquim.motger · February 10, 2025, 1:34pm

Check the results of the usage of tokens after processing 3 documents:

Processing review 1/1112
Review 1 processed (Time: 7.31s, Total Tokens: 5406, Prompt Tokens: 5326, Completion Tokens: 80)
Processing review 2/1112
Review 2 processed (Time: 7.00s, Total Tokens: 5556, Prompt Tokens: 5476, Completion Tokens: 80)
Processing review 3/1112
Review 3 processed (Time: 9.63s, Total Tokens: 5744, Prompt Tokens: 5664, Completion Tokens: 80)

As you can see, Prompt Tokens is accumulating between requests, making the consumption too high.

sps · February 10, 2025, 2:19pm

Welcome to the community @joaquim.motger

You can avoid accumulating costs by creating separate threads for each of the 60 token docs if the requests are not related to each other.

joaquim.motger · February 10, 2025, 2:35pm

Hi, thank you for your response! The issue here is that the guidelines are consumed for each request. If I understood correctly, these are system prompts, and therefore, they should not be computed as token usage at each request. But they are.

How can I make the assistant aware of the guidelines, without the need of being consumed at each request?

sps · February 10, 2025, 2:59pm

That’s exactly how instructions are supposed to work. They are included for every run.

kduffie · February 10, 2025, 10:10pm

Hint: consider using the gpt-4o-mini model. It works great and is a tiny fraction of the cost of other models for input tokens.

Topic		Replies	Views
Token Optimization for Assistants API - Excesive token count API gpt-4 , assistants , assistants-api	2	2772	May 24, 2024
Impact of Instruction Size and Thread Length on Token Usage in OpenAI Assistant API api , cost	8	3839	May 21, 2024
Assistant API token Usage - promt_tokens usage is too high API api-usage , assistants , assistants-api	8	1910	April 10, 2024
Assistant 2.0 Tokens Usage - Usage is too high API assistants-api , assistants-pricing	8	1770	April 30, 2024
Assistant API - consumes too much prompt tokens. What is the reason and how can I reduce it? API assistants , assistants-api	4	491	August 19, 2024

High token consumption due to large instructions in Assistant API

Related topics