When you initiate the run and request an Assistant
You can find this information in the documentation. You are charged for the complete conversation on each run.
Depends on the size of the information. It may be completely fed to the model as tokens, or it make be chunked, converted into embeddings, and then it’s only fair to calculate a MAX cost based on the size of the document. The model will iterate through the chunks until it has found a suitable answer
You can use a library like tiktoken and also try averaging your previous runs with the knowledge base. It’s much better to set a limit and then grab the calculated costs afterwards.
This is a downside to using Assistants.