Token Optimization for Assistants API - Excesive token count

marintudela · January 24, 2024, 11:29am

Hello,

I am currently utilizing the OpenAI API for sending messages to assistants and have encountered an issue with token usage. Each call I make seems to consume approximately 4000 tokens, which is perplexing considering my messages are about 50 tokens and the responses are typically around 200 tokens. I have lengthy instructions set up for the assistant, and I am wondering if this is impacting the token consumption.

To give you a better understanding, here’s the process I follow with my code:

Check for an existing thread ID to do the job; if not present, create a new thread.
I determine the assistant ID with an static id in my code.
I generate the input with around 200-400 tokens.
Add a message to the thread.
Execute the thread (Run command).
Check the status of the run.
Retrieve the steps of the run.
Finally, when is “completed”, I obtain the thread messages to view the response, ensuring to handle cases where the response might not be immediately available or if there’s an error.

My specific question is: How can I optimize token usage when interacting with OpenAI’s API, especially considering the length of my instructions? I tried to put fewer token in the instructions and I don´t get the same responses, but they bill less tokens. Is there a way to prevent these long instructions from increasing the token cost per call? I would like to understand if the instructions are being billed in every call, contributing to high token usage and how to avoid to be billed in every call for the training.

Any advice or shared experiences, particularly from those who have dealt with similar situations or have in-depth knowledge of OpenAI’s token billing system, would be greatly appreciated.

Thank you in advance for your assistance.

Diet · January 24, 2024, 12:26pm

Hi! Welcome to the forums!

Your instructions absolutely eat your tokens. Retrievals eat your tokens, a continuation of the thread eats your tokens. Actions eat tokens. Everything eats tokens. Om nom nom.

It’s my personal opinion that if you want to be cost conscious, assistants aren’t the best option out there.

Here’s a good post that summarizes it well:

jorgeintegrait · May 24, 2024, 5:45pm

With v2 of the API some of this has improved, we do now have prompt and completion token counts in at the run, and step levels.

Additionally, we can set maximum tokens per thread. Not a perfect solution, as it’s not like it auto-truncates, it just stops.

The recommendation for a production application would still be a custom RAG system, but assistants API is slowly getting there. It’ll probably be a good choice once it comes out of beta

Topic		Replies	Views
High token consumption due to large instructions in Assistant API API assistants-api	5	115	February 10, 2025
Impact of Instruction Size and Thread Length on Token Usage in OpenAI Assistant API api , cost	8	3851	May 21, 2024
Assistant API token Usage - promt_tokens usage is too high API api-usage , assistants , assistants-api	8	1911	April 10, 2024
Token use for updating instructions in an assistant API assistants , assistants-api	1	1464	December 16, 2023
Assistants API token usage and pricing breakdown clarification API gpt-4 , api , assistants	10	10505	February 6, 2024

Token Optimization for Assistants API - Excesive token count

Related topics