Does the pricing for the Assistant API charge only for the latest message and its output, or does it also include the cost of the entire conversation history within a thread?

celikmustafa89 · January 29, 2024, 3:58pm

Hello, I have a query regarding the pricing structure of the Assistant API, specifically related to the use of threads and conversation history. In the documentation, it’s mentioned that pricing is applied to both inputs and outputs, which is clear to me. However, I’m uncertain about how charges are calculated when a new message is sent within a thread that contains previous conversation history. My question is: are the charges applied only to the new message and its corresponding output, or are they also applied to the entire conversation history (i.e., history + current message) along with the output? This point isn’t explicitly covered in the documentation, and I would appreciate clarification on which method is used for pricing.

Diet · January 29, 2024, 4:22pm

Hi! Welcome to the forum!

The long and short of it is that you will be charged for everything

You will be charged for the whole thread, the retrieved documents, your new query, and the new output every time the thread runs.

You’re not saving any money by using assistants.

Hope this helps!

j.o · June 12, 2024, 3:13am

Can confirm that threads indeed are charging you for everything.

Here’s a simple test I ran. On the same thread, I input the same user message twice and got the same assistant messages. In between the runs, I took screenshots of my billing activity.

Bill after first run: 1 request, 879 tokens
Bill after second run: 2 requests, 2367 tokens

We can see that the bill for the second run is 1488, almost 2 times the amount of the first run. If we model off of this, the pricing model for threads is an arithmetic series:
S = 1+2+3+4+...+n

So as the conversation gets longer, your wallet also halves every time…

AICurious · October 23, 2024, 11:40pm

In the Playground, I see the behavior described by @j.o, in which EACH subsequent message in a thread appears to require the tokens necessary for all previous messages, plus the new message. So, yes, some kind of horribly increasing arithmetic sum.

However, when I submit things through the API, it’s actually almost the reverse. Subsequent messages in a thread report a much smaller token usage than they would have on their own (as per returned run.usage.total_tokens) . For example, submitting 2 different queries in their own threads costs 10K tokens overall (~5K for each thread), but submitting them in the same thread costs only 6K overall (5K to get started, and then about 1K for each subsequent message in the thread). I love it, but… is it true?

Topic		Replies	Views
Pricing for assistant instructions and threads/runs API assistants-pricing	4	836	June 12, 2024
Billing Clarification: Charges for Adding Messages vs. Running Threads API assistants-api	1	308	July 6, 2024
Messages stored in thread did it const for each message for each run API pricing , threads , api-threads	1	2405	November 8, 2023
Assistants API Pricing Using GPT-4 API assistants , assistants-api , assistants-pricing	1	7658	December 27, 2023
Understanding pricing for assistant api threads API assistants , assistants-api , assistants-pricing , gpt-4o-mini	1	405	August 8, 2024

Does the pricing for the Assistant API charge only for the latest message and its output, or does it also include the cost of the entire conversation history within a thread?

Related topics