Assistant API - consumes too much prompt tokens. What is the reason and how can I reduce it?

Hello. I am building chat assistant with Assistants API. During day I tested it. And from logs I see that it consumes way too much prompt tokens. More than 5000 usually and with GPT-4 model. With GPT-3 model it consumes 8000 , 9000 or more prompt tokens. My assistant does not have any tools enabled. But when I try it in Playground I see that it does not consume so much prompt tokens.
What is the reason ? I attached screenshot from some run logs

1 Like

I think this is because of messages. Because the 1st messages have less prompt usage. But as I sent more messages to assistant it increased in each message. I thought may be deleting messages from thread will help. But it did not.

1 Like

@javidd - You are on the right track. Conversation history is also passed as to the LLM for context and information retrieval from history. You could say 4 chars ~ 1 token or 75 words ~ 100 tokens, if you delete messages from the thread you could calculate the difference in tokens based on the length of the message and check for the difference. Maybe you could try having a concise system prompt and use techniques like RAG for context based retrieval (if any) to save on tokens. Hope this helps - Cheers!

1 Like

Actually I have used default options for API. Later I created new thread and checked usage again. As I say by each new message usage increased. I am using just simple prompts. So it is not api configuration or complex prompt problem

Hi, I read your comment. But the one which you wrote were not the reason. Have you tried assistants api ? I searched over internet, and here in community. I see that many people have this problem. 99% the problem is because of history. And I think OpenAI should add parameter something like history_mode on/off. So that we can use less tokens. Otherwise why to use assistants API if it consumes so much tokens ? I wanted to use because its file reading feature and wanted to make assistant service. May be someone from OpenAi will read my comment and answer