Let’s discuss the issues I’ve recently encountered while using the Assistant’s API, focusing on token consumption and cost uncertainty.
The first issue is about token consumption. In the Assistant’s API, there isn’t a clear field indicating how many tokens are consumed with each message sent and received. This lack of transparency makes it challenging for developers to calculate costs effectively. Additionally, the usage page in the backend, although it marks which model is used and the number of tokens consumed, has shown variable request counts in my recent tests. For the same question-and-answer pattern, like a single query and response, the number of requests shown can vary significantly. Sometimes it’s counted as one request, other times as many as eight, leading to confusion about where tokens are being used.
The second issue relates to cost uncertainty. Sometimes this seems problematic, other times not. For instance, when binding a file at the message level, all contents of this file are included in the token calculation. Since each question and answer carries a history record, this leads to a significant consumption of tokens. I am also unclear whether each retrieval of content from the file for a question is counted towards the tokens. The size of the file bound at the message level substantially impacts the token consumption for the entire conversation in a thread. In my case, when I had five continuous conversations using the Assistant’s API, my costs varied - $0.18, $0.03, $0.03, $0.06, and $0.40, respectively, with the backend showing 5, 6, 7, 8, and 16 requests. This inconsistency makes it challenging to understand how tokens are being consumed, especially when working with files bound at the message level.