Pricing for assistant instructions and threads/runs

amjuks · June 6, 2024, 10:52am

I have seen that instructions have a limit of 256k characters, and that its passed on for every response from a thread. How is the pricing for this calculated? Is it under input token rates? Is it charged everytime?

The pricing on official page only details for the tools. What’s the charge of using threads and runs? Is the pricing accounted for past messages in thread when retrieving a thread/starting a run?

_j · June 6, 2024, 1:21pm

The system message is measured in tokens for billing and encoded into tokens for use by the AI, despite that the API blocks after a certain number of characters.

It is not just charged every time, with assistants and the internal tools they can use that require multiple calls to the the AI model, the instructions, like conversation history in a thread that also must be re-sent for understanding, can be billed multiple times to obtain a single response.

Somebody else’s code is managing sending the messages again to the stateless model with Assistants. When you interact directly with an AI model on the chat completions endpoint, you can decide how many tokens to send.

amjuks · June 7, 2024, 12:08pm

Thank you!

Do we have a work around for this? I assume I can fine-tune a model with these instructions as system message then use it for my assistant.

_j · June 7, 2024, 1:25pm

That’s not quite how fine-tuning works.

You don’t train on instructions, you train on examples of what to produce.

Putting instructional system message in your training but then not the matching instructions will make your following of examples worse, as the AI follows patterns.

j.o · June 12, 2024, 3:19am

I did a simple test to get an understanding of the pricing structure. On the same thread, I input the same user message twice and got the same assistant messages. In between the runs, I took screenshots of my billing activity.

Bill after first run: 1 request, 879 tokens
Bill after second run: 2 requests, 2367 tokens

We can see that the bill for the second run is 1488, almost 2 times the amount of the first run. If we model off of this, the pricing model for threads is an arithmetic series:
S = 1+2+3+4+...+n

So as the conversation gets longer, your wallet also halves every time…

Topic		Replies	Views
Does the pricing for the Assistant API charge only for the latest message and its output, or does it also include the cost of the entire conversation history within a thread? API assistants-pricing	3	1624	October 23, 2024
Impact of Instruction Size and Thread Length on Token Usage in OpenAI Assistant API api , cost	8	3941	May 21, 2024
Assistants API Pricing Using GPT-4 API assistants , assistants-api , assistants-pricing	1	7826	December 27, 2023
Token Optimization for Assistants API - Excesive token count API gpt-4 , assistants , assistants-api	2	2847	May 24, 2024
Billing Clarification: Charges for Adding Messages vs. Running Threads API assistants-api	1	364	July 6, 2024

Pricing for assistant instructions and threads/runs

Related topics