Why are my context tokens used so quickly?

When using the Retrieval function of the Assistants API (uploading six books for retrieval), why are my context tokens used so quickly? One question requires 20,000-30,000 context tokens. Is this normal? It is because of the polling method. Is it the result of multiple inquiries or does it require tokens to find the content of the book? Is there any optimization method?

If you upload retrieval information, the assistant backend will remorselessly fill the AI model’s context window with information, or if results are not optimum, the AI has its own function to call for more documentaiton in another turn…

I wrote a response earlier about sources of token use, click “Context tokens…”:

https://platform.openai.com/docs/assistants/how-it-works/context-window-management or https://platform.openai.com/docs/assistants/tools/how-it-works

1 Like

Thank you, can I understand it this way? No matter what questions I ask, it will occupy 16K tokens. Plus some other information, my tokens will reach 20,000+. Is there any way to limit the use of tokens, or relevant code recommendations?

The cost associated with the autonomous nature of agents has been an ongoing concern.

Identified the day of release:

And more the same week:

There is no implementation of controls in two months following, except by the model selected and its maximum context length.

The solution to budget control is to continue using the chat completions endpoint, with which you have the inconveniencing of needing to unpack “files” into text the AI can understand, but the advantage that you can make knowledge injected (search “RAG”) be very relevant to the current user input and not rely on continued external calls to internal tools. You can also control how much chat history length is really sent.

1 Like