Hi, I’m wondering if someone is familiar with the Assistant billing setup can help me understand its calculation of
So yesterday I was testing with an assistant with knowledge retrieval in playground.
I created an assistant. Then I asked about 5 or 6 questions, for each question, I attached a text file (each contains around ~4k english words). I did some quick summary-ish work and done. Then I deleted all assistants and files.
Later, I checked the billing, there’re like 170K context tokens and 1k generated tokens.
Wondering what exactly were those 170K context tokens? Does assistant reading files count as context tokens?
A little bit confused here. I think Im asking how exactly I can estimate my costing when dealing with assistant and files.
More context. In those assistant responses, i saw they had reference (raw text) to the files. Does those count as context tokens?
Anything sent to the model is counted.
If the model retrieves text from a file that is context and it is billed accordingly.
I have similar question related to costs. I’ve noticed that when I will create Assistant with files then ‘context token’ gets really high. So my question is: are files read once per thread or each message is poisoned by ‘files content’ so that everytime I ask question in same thread it generates high ‘context token’ cost?
- Let’s assume we have file which has 10k tokens
- I’m creating thread
- I’m creating a message with 100 tokens
- I run Assistant
- I will get 10.1k tokens cost for ‘context token’ and X amount of tokens for produced result (this part is clear)
- Then I will write SECOND message in existing thread which will contain also 100 tokens.
Question: how much second message will cost context tokens?
a) 10.1k = 10k which come from file + 0.1k which come from SECOND message
b) 10.2k = 10.1k which come from whole thread conversation + 0.1k from SECOND message
c) 0.1k = 0.1k comes from SECOND message because model already charged for files and reusing historical data does not generate costs
d) something between 0.1k and 1.0k = SECOND message costs 0.1k and there is an algorithm which will generate some unknown cost based on historical messages (in short: gambling cost)
e) it works totally different - nobody can explain how tokens are calculated so that users must be ready to pay 128k tokens per request in worst case as this is maximum capacity of GPT-4.