Hi, I’m new to here, and assistant api, and retrieval function.
I don’t quite understand how Retrieval work when breaking it down. I hope you guys can verify if I’m on the right track
From my understandings, One ask a assistant a question if assistant thinks it needs context it will trigger the RAG stuff? And retrieval function will keep passing words appending into the message until the Assistant satisfy? and then started to generate answer based on the info Retrieval function passed. And those context being passed will be considered as input tokens? (this is my question)
For example in my case.
I’m creating a assistant with 2 file attached. When I ask it which requires the context of those files, the RAG will start passing things, and those will be counted input token? And I will be charge with like (GPT 4 turbo input 0.1 usd / 1ktoken)?
seems that The retrieval model tends to pass the whole files instead of just the most relative things, which result in steep price?
In my case I only ask a question and it says 10633 tokens (10122 input + output)
Does that mean only one question will cost 1 dollar at least? That’s awful lot of money.
And what is context token? (input token?) I don’t think so if it’s the case then I will be charging like 20 usd for that, but it’s 3.5 usd this month.
(Sorry for bad English. I’m not a native speaker. Hope this wouldn’t be an issue)
Thanks for your patient and time