From my tests it seems to bill you two things :
- the overall conversation history and the answer, each time you press run.
- for any intermediate tasks created by a run using tools , the actual token usage.
So there is no special gifts, you just pay for everything happening under the hood. Which is understandable. However in term of control, it is difficult to manage. If your user upload a very long document, and the Retrieval method does not produce results, it will suggest to actually read all the document (which will count as input tokens), you can end up quickly at 1$ for a single request.
One question i have is, if you ask another question that requires reading all the document. Will it read all over again, or did he extracted and saved some kind of info to avoid reading all again… (which I honnestly doubt…)