Too many input tokens are used by Assistant

Yes, tool returns are also maintained as past conversation in a growing thread. With no option presented to expire or delete these hidden turns.

Unless you specifically tune the parameters, you will get maximum return from the file search tool even if documents are of zero relevance, and maximum loading of billed tokens at every internal turn.

This post has clearer documentation of the file_search ranker, where you can set a similarity threshold so that simply unrelated document junk isn’t maximizing the cost:

You can start at 0.40-0.50.

There is also not a token limit or a budget for you to set, an internal parameter obviously necessary to not exceed model capability, not exposed to you, but you can limit the number of past turns with the run truncation_strategy parameter.