Multi Agent Cost Optimization - Assistants API vs Completions

joao.d.oliveira · December 10, 2024, 4:12pm

Hi everyone,

I’m looking to understand the cost differences between the Assistants API and the Completions API for a multi-agent conversational chatbot.

Background: Current Implementation

I’ve implemented a multi-agent system using LangGraph, leveraging OpenAI’s ChatCompletion API. The setup works well, but I’m seeking ways to optimize costs.

The chatbot involves back-and-forth conversations, and the prompt sent to OpenAI typically looks like this:

<Instruction Prompt>  
Conversation History:  
- User msg: 1  
- AI msg: 1  
- AI Tool Calling: 1  
- AI Tool Calling: 2  
- User msg: 2  
- AI msg: 2

The challenge is the length of the instruction prompt , which exceeds 5000 tokens. With several thousand users, the cost of input tokens (cached or not) adds up quickly. For every call, the entire prompt is resent, though the conversation updates (a relatively small token count). The main cost driver is the static instructions.

I’m aware of potential optimizations like prompt segregation, RAG, Q/A vector stores, etc., which I plan to explore.

Question: Cost Optimization with Assistants API Threads

I’m considering migrating to the Assistants API, which introduces the concept of “Threads.” My understanding is that Threads might eliminate the need to resend initial instructions with every call, potentially reducing input token costs.

However, I’m uncertain whether the Assistants API still resends the full prompt—including instructions—internally for each Thread call.

Can anyone clarify how the cost components work for the Assistants API with Threads? Specifically:

Does it genuinely reduce input tokens compared to the Completions API?

Any insights or experiences would be greatly appreciated!

Thank you!

anon10827405 · December 10, 2024, 4:49pm

Assistants uses ChatCompletions under the hood.

Even though it abstracts away the management of conversations through Threads you still need to pay for the conversation each time, minus any potential discounts from caching.

No

joao.d.oliveira · December 10, 2024, 5:35pm

Thanks @anon10827405 I suspected exactly that.

My supposition is that Assistant API with Threads, is creating a list prompt comprised of:

<Assistant Prompt>
<User Msg1>
<AI Msg>
<...>

And then sending to the Completions API.

calafatidis · December 16, 2024, 9:56am

@anon10827405 does this apply to the assistant instructions too? Am I charged for system instructions every time I run a thread?

Foxalabs · December 16, 2024, 9:59am

Every time an API call is made, all information is sent to the model for that thread, all of the past messages, the system instruction, the current user prompt and any RAG data from uploaded files.

Topic		Replies	Views
Can Instructions be reused at no cost? Or, how to save on tokens API	4	2837	January 1, 2024
Seeking Advice on Reducing Costs for RAG Chatbot Using File Search Assistant API api	4	984	July 6, 2024
Token Optimization for Assistants API - Excesive token count API gpt-4 , assistants , assistants-api	2	2738	May 24, 2024
Assistant vs API pricing for non retrieval normal conversation API	5	1104	March 21, 2024
Choosing between GPT Assistant and Chat Completion. Which one is better? API chat-completion , assistants , assistants-api	2	1478	May 31, 2024

Multi Agent Cost Optimization - Assistants API vs Completions

Background: Current Implementation

Question: Cost Optimization with Assistants API Threads

Related topics