Using too many tokens for "incoming" requests

While my Python program is making its requests (user prompts) to Assistant, it is using nearly 500 tokens each time even if the prompt consists of 8-15 words.
And with each new few-words-prompt it increases “In” tokens for around the same amount (~500) (screenshot Nr3)
I understand that in case of one Thread, the model “reads” the previous texts in current chat, and that’s okay. But it seems those 500 “In” tokens are always adding.
Does anybody know what might be the issue here?



If you are making use of the retrieval feature, then the assistance engine may use any amount of tokens up to 128k as context to answer the query, so I imagine it is that function that is creating the initial 500 tokens of usage.

1 Like

Thank you. What does it mean “use of the retrieval feature” ?

If you have uploaded documents that should be used when answering questions then you are making use of the retrievals feature.

1 Like

I have no uploaded documents for this Assistant.