As I discussed, it is the lengthy internal descriptions of tools that are provided to the Assistant AI model that are being measured and shown.
"You have a file search tool… " that continues for many paragraphs gives you built-in expense when file search on vector stores is enabled.
Ask a question where the AI performs a document search instead of just saying hello, and the expense of the response will jump from 800 tokens to 10000+ input tokens…
The “run” parameter for the maximum number of token chunks returned was not being respected the last time I checked the Assistants in the Playground UI.