The instructions you include are just part of other instructions unseen that are indeed sent to the AI model every call. But more importantly, when using any of the functionality besides just the instruction, the AI agent has the ability to make multiple repeating iterative calls that each cost in tokens.
Then when the conversation grows, there is no truncating it, each chat turn, each assistant response, the inputs and outputs of functions and retrieval all load the AI context to maximum.
There is absolutely no accounting for or predicting how high a single user input could potentially be.
You must only use the AI model GPT-4-1106-preview, or gpt-3.5-turbo-1106, as they are the only one that can emit parallel tool calls required by retrieval (no, you don’t get embeddings RAG for free), to do otherwise can have costly failure loops. They will also currently fail on functions using accented characters in some languages and unicode.
The cost you refer to though is the cost of data storage, multiplied by the number of unique assistants linked to that data storage, per day.
And no, there is no report for you when all is done with your usage. All-in-all: avoid.