Hi, I’m wondering if someone is familiar with the Assistant billing setup can help me understand its calculation of context token
So yesterday I was testing with an assistant with knowledge retrieval in playground.
I created an assistant. Then I asked about 5 or 6 questions, for each question, I attached a text file (each contains around ~4k english words). I did some quick summary-ish work and done. Then I deleted all assistants and files.
Later, I checked the billing, there’re like 170K context tokens and 1k generated tokens.
Wondering what exactly were those 170K context tokens? Does assistant reading files count as context tokens?
A little bit confused here. I think Im asking how exactly I can estimate my costing when dealing with assistant and files.
I have similar question related to costs. I’ve noticed that when I will create Assistant with files then ‘context token’ gets really high. So my question is: are files read once per thread or each message is poisoned by ‘files content’ so that everytime I ask question in same thread it generates high ‘context token’ cost?
Example:
Let’s assume we have file which has 10k tokens
I’m creating thread
I’m creating a message with 100 tokens
I run Assistant
I will get 10.1k tokens cost for ‘context token’ and X amount of tokens for produced result (this part is clear)
Then I will write SECOND message in existing thread which will contain also 100 tokens.
Question: how much second message will cost context tokens?
a) 10.1k = 10k which come from file + 0.1k which come from SECOND message
b) 10.2k = 10.1k which come from whole thread conversation + 0.1k from SECOND message
c) 0.1k = 0.1k comes from SECOND message because model already charged for files and reusing historical data does not generate costs
d) something between 0.1k and 1.0k = SECOND message costs 0.1k and there is an algorithm which will generate some unknown cost based on historical messages (in short: gambling cost)
e) it works totally different - nobody can explain how tokens are calculated so that users must be ready to pay 128k tokens per request in worst case as this is maximum capacity of GPT-4.