Does `context token` including the uploaded file in Assistant messages?

Hi, I’m wondering if someone is familiar with the Assistant billing setup can help me understand its calculation of context token

So yesterday I was testing with an assistant with knowledge retrieval in playground.

I created an assistant. Then I asked about 5 or 6 questions, for each question, I attached a text file (each contains around ~4k english words). I did some quick summary-ish work and done. Then I deleted all assistants and files.

Later, I checked the billing, there’re like 170K context tokens and 1k generated tokens.

Wondering what exactly were those 170K context tokens? Does assistant reading files count as context tokens?

A little bit confused here. I think Im asking how exactly I can estimate my costing when dealing with assistant and files.



More context. In those assistant responses, i saw they had reference (raw text) to the files. Does those count as context tokens?

Anything sent to the model is counted.

If the model retrieves text from a file that is context and it is billed accordingly.

1 Like

I have similar question related to costs. I’ve noticed that when I will create Assistant with files then ‘context token’ gets really high. So my question is: are files read once per thread or each message is poisoned by ‘files content’ so that everytime I ask question in same thread it generates high ‘context token’ cost?


  • Let’s assume we have file which has 10k tokens
  • I’m creating thread
  • I’m creating a message with 100 tokens
  • I run Assistant
  • I will get 10.1k tokens cost for ‘context token’ and X amount of tokens for produced result (this part is clear)
  • Then I will write SECOND message in existing thread which will contain also 100 tokens.

Question: how much second message will cost context tokens?
a) 10.1k = 10k which come from file + 0.1k which come from SECOND message
b) 10.2k = 10.1k which come from whole thread conversation + 0.1k from SECOND message
c) 0.1k = 0.1k comes from SECOND message because model already charged for files and reusing historical data does not generate costs
d) something between 0.1k and 1.0k = SECOND message costs 0.1k and there is an algorithm which will generate some unknown cost based on historical messages (in short: gambling cost)
e) it works totally different - nobody can explain how tokens are calculated so that users must be ready to pay 128k tokens per request in worst case as this is maximum capacity of GPT-4.


thanks for detailed case - my experiments also showed high token over-usage and so seems b) - the answer for your Question
My case: Real context sharing by assistant within thread - API / Feedback - OpenAI Developer Forum