How do we get billed when tools are used in the Assistant API?

olivier.divay35 · August 1, 2024, 6:25pm

The Assistant API can ask the developer to make function calls, and there’s a simple way to provide responses. In any case, 2 model calls are made. How do we get billed for these calls? What chat history does it use for the second call, with what prompt?
Does (stateful) Assistant threading include some smart to prevent reprocessing of the chat history? In other words, is it faster than the bare-bone (stateless) solution of pruning the chat history ourselves and resubmitting?

Diet · August 1, 2024, 6:34pm

Welcome to the community!

I suspect no one really knows. I’m fairly certain you always get billed for all tokens in or out, regardless of what the model does.

Does (stateful) Assistant threading include some smart to prevent reprocessing of the chat history?

I don’t think so. You have the parameters run-max_prompt_tokens and run-truncation_strategy, but there’s nothing “smart” going on in the background to save you tokens.

Assistants are definitely slower, I’d say.

Topic		Replies	Views
Assistant vs API pricing for non retrieval normal conversation API	5	815	March 21, 2024
Does the pricing for the Assistant API charge only for the latest message and its output, or does it also include the cost of the entire conversation history within a thread? API assistants-pricing	3	1280	October 23, 2024
How to call API for multi-round messages without being charged for history messages? API	6	113	August 29, 2024
Assistants API Pricing Using GPT-4 API assistants , assistants-api , assistants-pricing	1	7577	December 27, 2023
Pricing of Assistant API misleading API	1	2004	December 11, 2023

How do we get billed when tools are used in the Assistant API?

Related topics