Too many input tokens are used by Assistant

I am using OpenAI assistant API. I have attached 4 knowledge files with 50 pages in each. While asking my assistant first question is fine, it takes my PROMPT + question size. However, after asking 2nd and further questions, input tokens are in a crazy huge size (70k or even 90k). Has anyone come across whit issue before? Next question, while retrieving data from knowledge files, does assistant grab all necessary big context? Is this grabbed context saved to thread and sent to assisant upon each run request?

Yes, tool returns are also maintained as past conversation in a growing thread. With no option presented to expire or delete these hidden turns.

Unless you specifically tune the parameters, you will get maximum return from the file search tool even if documents are of zero relevance, and maximum loading of billed tokens at every internal turn.

This post has clearer documentation of the file_search ranker, where you can set a similarity threshold so that simply unrelated document junk isn’t maximizing the cost:

You can start at 0.40-0.50.

There is also not a token limit or a budget for you to set, an internal parameter obviously necessary to not exceed model capability, not exposed to you, but you can limit the number of past turns with the run truncation_strategy parameter.

thank you, i have implemented “truncation_strategy”. However, i have not been able to solve my second proble: infinite “incomplete” run status. Btw, i am using OpenAI account and Azure OpenAI. i am not having any issues with OpenAI. but the problem is arising with Azure Open Ai account