I’m yet to understand the token usage by Assistant API . Here is my statistics,
- Added a File that contains 3600 Words
- I have a Bot instruction which have 17 Words
- Run time instructions have 69 Words.
- Asked question consist of 8 words.
This is approximately 3694 words in total, which is approximately 5346 tokens.
With first reply, the token usage is showing 6159 tokens, with input token alone 6052.
My question is, even if the Entire content is added as context it will be less than 6000, now how can it be 6052 input tokens? Can somebody shed some lights on this?
Token usage is given in screenshot:
That seems quite unlikely. Token usage is going to be higher than words.
Paste your text: https://tiktokenizer.vercel.app/
I had added the full document and instructions and its coming 5346 Tokens (Even my input token is greater than this. ). But This is full text of the attached files. Is it suppose to use chunks of matched context from the vector search? Or each time will it input the full content?
based on the results, i had modified the title and thread values in my first post.
Here is the screenshot.
The documentation answers that the Assistants agent framework pays no mind to your budget…
Retrieval currently optimizes for quality by adding all relevant content to the context of model calls. We plan to introduce other retrieval strategies to enable developers to choose a different tradeoff between retrieval quality and model usage cost.
“All relevant content” = all that will fit in the model’s context length.
The assistant and its internal functions for retrieval and other tools has its own language that also consumes tokens.
But the thing is that, even using this much input context, quality of output is not that great. I had tested the same document against other startup products. They provides accurate answers. eg: I have a table with distance mentioned for 2 location. Table contains 10 rows with From and TO and distance for that. Assistant gave me wrong answer, but 2 other Raag As a service provided exactly same answer as in table. I thing we need to wait to get this matured. worried to put into production.
A little Update, with new update for GPT3.5 Turbo 0125, the input token usage has been significantly reduced.