Multi Agent Cost Optimization - Assistants API vs Completions

Hi everyone,

I’m looking to understand the cost differences between the Assistants API and the Completions API for a multi-agent conversational chatbot.

Background: Current Implementation

I’ve implemented a multi-agent system using LangGraph, leveraging OpenAI’s ChatCompletion API. The setup works well, but I’m seeking ways to optimize costs.

The chatbot involves back-and-forth conversations, and the prompt sent to OpenAI typically looks like this:

<Instruction Prompt>  
Conversation History:  
- User msg: 1  
- AI msg: 1  
- AI Tool Calling: 1  
- AI Tool Calling: 2  
- User msg: 2  
- AI msg: 2  

The challenge is the length of the instruction prompt , which exceeds 5000 tokens. With several thousand users, the cost of input tokens (cached or not) adds up quickly. For every call, the entire prompt is resent, though the conversation updates (a relatively small token count). The main cost driver is the static instructions.

I’m aware of potential optimizations like prompt segregation, RAG, Q/A vector stores, etc., which I plan to explore.

Question: Cost Optimization with Assistants API Threads

I’m considering migrating to the Assistants API, which introduces the concept of “Threads.” My understanding is that Threads might eliminate the need to resend initial instructions with every call, potentially reducing input token costs.

However, I’m uncertain whether the Assistants API still resends the full prompt—including instructions—internally for each Thread call.

Can anyone clarify how the cost components work for the Assistants API with Threads? Specifically:

  • Does it genuinely reduce input tokens compared to the Completions API?

Any insights or experiences would be greatly appreciated!

Thank you!

1 Like

Assistants uses ChatCompletions under the hood.

Even though it abstracts away the management of conversations through Threads you still need to pay for the conversation each time, minus any potential discounts from caching.

No

4 Likes

Thanks @RonaldGRuckus I suspected exactly that.

My supposition is that Assistant API with Threads, is creating a list prompt comprised of:

<Assistant Prompt>
<User Msg1>
<AI Msg>
<...>

And then sending to the Completions API.

1 Like

@RonaldGRuckus does this apply to the assistant instructions too? Am I charged for system instructions every time I run a thread?

Every time an API call is made, all information is sent to the model for that thread, all of the past messages, the system instruction, the current user prompt and any RAG data from uploaded files.

1 Like