Pricing for assistant instructions and threads/runs

I have seen that instructions have a limit of 256k characters, and that its passed on for every response from a thread. How is the pricing for this calculated? Is it under input token rates? Is it charged everytime?

The pricing on official page only details for the tools. What’s the charge of using threads and runs? Is the pricing accounted for past messages in thread when retrieving a thread/starting a run?

The system message is measured in tokens for billing and encoded into tokens for use by the AI, despite that the API blocks after a certain number of characters.

It is not just charged every time, with assistants and the internal tools they can use that require multiple calls to the the AI model, the instructions, like conversation history in a thread that also must be re-sent for understanding, can be billed multiple times to obtain a single response.

Somebody else’s code is managing sending the messages again to the stateless model with Assistants. When you interact directly with an AI model on the chat completions endpoint, you can decide how many tokens to send.

1 Like

Thank you!

Do we have a work around for this? I assume I can fine-tune a model with these instructions as system message then use it for my assistant.

That’s not quite how fine-tuning works.

You don’t train on instructions, you train on examples of what to produce.

Putting instructional system message in your training but then not the matching instructions will make your following of examples worse, as the AI follows patterns.

I did a simple test to get an understanding of the pricing structure. On the same thread, I input the same user message twice and got the same assistant messages. In between the runs, I took screenshots of my billing activity.

Bill after first run: 1 request, 879 tokens
Bill after second run: 2 requests, 2367 tokens

We can see that the bill for the second run is 1488, almost 2 times the amount of the first run. If we model off of this, the pricing model for threads is an arithmetic series:
S = 1+2+3+4+...+n

So as the conversation gets longer, your wallet also halves every time…


1 Like