Assistant API - What are Context Tokens in the Billing calculation?

I notived in my billing page that when looking under “Usage” > “Activity” I have for example today 4104 generated Tokens + 55572 Context Tokens, billed are 59676 Tokens! What are Context Tokens? And how can I control them?


Hi and welcome to the Developer Forum!

Are you using Assistants? that seems to be the tokens used to generate context from your uploaded data.

1 Like

Thank you Foxabilo! Yes I am using Assistants now, but this kind of calculation was also done while using Playground and Chat! So it seems to be not related to Assistants exclusively! Because while using Chat there is no uploaded data!

Sure, but if you are using a combination of assistants and normal chat, you will see them all bunched together, unless there is days separating them

1 Like

“Context” is OpenAI’s new language for “prompt” or input. It’s what is loaded into the AI model before it generates a language output - and maybe not an output to you at first.

Likely meant to cause confusion between the pricing page “input” and “output”, so you don’t directly observe that when they load the assistants model with maximum input of “context” from chat “threads” and “knowledge retrieval”, and let the AI go wild with piecemeal browsing of documents and writing code and iterating on errors with that context…you get a bill.

Once the size of the Messages exceeds the context window of the model, the Thread will attempt to include as many messages as possible that fit in the context window and drop the oldest messages.

Retrieval currently optimizes for quality by adding all relevant content to the context of model calls.

It’s not going to exceed the context window of GPT-4-turbo until (124k x $0.01) at their max_tokens max setting.

1 Like

hello we are having the same problem. is there any workaround to control costs?

Seems not as for now. What is utterly insane is that there is no token feedback at all using the assistants API. If there was something like the “usage” key in chat completions integrating custom token limits for any model would be a piece of cake.

1 Like

so true, the pros is less engineering but the cons outweighs the pros with pricing, we just finished implementing a product finder , we realized the amazing number context tokens during the testing, it is pretty wild and not transparent at all

we have downgraded model to 3.5 for narrower contextual window , but this time the retrieval becomes pretty useless.


I’m sticking with third party providers for retrieval as of now. A lot more flexible and cheaper if you host them localy. Good luck with your project :grinning:

oh nice, thank you, is there any third parties particular you would recommend us to explore?

Probebly Weaviate although there is a learning curve they support rerankers, keyword search, hybrid search etc. So what you do is formulate a function tool namned something like “search_products” with an appropiate schema and then you pass on the arguments given by the llm to a vector store search function. Although it requires quite a bit more work there is a plethora of techniques and options using a local vector store will provide you with.
For example, you said you’re implementing a product finder, a local vector store search function could allow the model to specify:
-Date. If you have a release date or simular for each product.
-Catagories. If your products are already sorted into different categories allowing the model to choose which one to search could help a lot.
-In stock. Simply allow the model to filter out items not in stock at the moment.
-Price. Filter on price based on any potential user request.

Just to name a few. But as I said will require more work! :grin:

1 Like