Does the pricing for the Assistant API charge only for the latest message and its output, or does it also include the cost of the entire conversation history within a thread?

Hello, I have a query regarding the pricing structure of the Assistant API, specifically related to the use of threads and conversation history. In the documentation, it’s mentioned that pricing is applied to both inputs and outputs, which is clear to me. However, I’m uncertain about how charges are calculated when a new message is sent within a thread that contains previous conversation history. My question is: are the charges applied only to the new message and its corresponding output, or are they also applied to the entire conversation history (i.e., history + current message) along with the output? This point isn’t explicitly covered in the documentation, and I would appreciate clarification on which method is used for pricing.

Hi! Welcome to the forum!

The long and short of it is that you will be charged for everything

You will be charged for the whole thread, the retrieved documents, your new query, and the new output every time the thread runs.

You’re not saving any money by using assistants.

Hope this helps!

1 Like

Can confirm that threads indeed are charging you for everything.

Here’s a simple test I ran. On the same thread, I input the same user message twice and got the same assistant messages. In between the runs, I took screenshots of my billing activity.

Bill after first run: 1 request, 879 tokens
Bill after second run: 2 requests, 2367 tokens

We can see that the bill for the second run is 1488, almost 2 times the amount of the first run. If we model off of this, the pricing model for threads is an arithmetic series:
S = 1+2+3+4+...+n

So as the conversation gets longer, your wallet also halves every time…