This is how assistant API calculate tokens usage within a thread using function calling.
User:
Tell me about my site today
Assistant:
Calling function… (JSON as return)
Assistant:
return response.
(6K tokens)
User:
do you remember the information in your previous message?
Assistant:
Yes, I remember the details from the previous message. How can I assist you further with that information?
(6K tokens)
User:
If I ask a similar question again, will you call the function again to retrieve data, or use the information obtained from the previous message?
Assistant:
If your question is closely related to the information already retrieved and discussed in the previous message, I would use the same information to answer your question without calling the function again. This helps provide quick and efficient responses based on the most recent data available. However, if the question requires new or updated data that wasn’t previously retrieved, I might need to call the relevant function again. How may I assist you further?
(6K tokens)
User:
OK tell me the total traffic from US, UK and India
Assistant:
Based on the information from the previous data:
…
(6K tokens)
User:
is the sky blue?
(6K tokens)
What do you think? Is it fair enough to calculate the tokens this way? I have multiple functions available and some functions might return 100 - 200kb JSON.