Assistant API - What are Context Tokens in the Billing calculation?

But those aren’t not considered as input token right? then how to calculate the price?

Me too, 759 937 in 250 API calls during two days of personal testing. Imagine how much money it would cost if I had it live on a website with thousands of users. I could go bankrupt fast.

I am hoping they manage to streamline the use of retrieval.

1 Like

I had the same experience. I scraped hundreds of web pages into text and stored them into a JSON format, with URL, title and the content attributes, roughly 500KB.

Using Assistant and retrieval, if you ask some simple question such as are you human? it only generates very little tokens. (500 in, 37 out)

If you ask a simple question from the JSON such as “tell me briefly what you know about the document”, It generates thousands of tokens (4000~6000 in, 300 out)

If you ask complex question such as “My website is using Wordpress, having monthly traffic of 50K, recommend which plan is best for me as well as guide me step by step how to set up the tracking”, It generates 13K ~32K tokens ( 13~32K in 500 out)

I guess the tokens is not consumed arbitrary but it depends how much token the Assistant has retrieved from the files until it find the answer. What if I have 10 files with 5MB each? It would consumes 150K ~ 320K token every time someone using the Assistant?

I have url and title attributes in the JSON, I am not sure if the retrieval is smart enough to read the title and only read the most related content instead of reading everything.

I believe all these somehow contributed to the context tokens as well since I have 16 times more context tokens than generated. If the retrieval has already charged tokens from the file reading within threads, it shouldn’t charge more on context token.

IMO, we shouldn’t ends up paying more every time the Assistant trying to access the files with arbitrary tokens, the intelligent of the retrieval to locate answers in the shortest path shouldn’t be on our cost.

2 Likes

Is there any update on this as I experienced similar issues now.

1 Like

Update is that assistants works the same, you let the AI use its own tools and make multiple internal calls with its internal decision-making.

The assistants API now uses a vector store search if you have used the new beta v2 header and created vector storage. The number of context tokens placed will be different from similar inputs, when the AI decides or has been instructed to use the search, making two internal API calls, the second answering you with the retrieved data also contributing to that and future session context (input) tokens.

You can specify the number of past chat turns to preserve in runs, which can limit the expense (and understanding) of long sessions.