Im trying to understand the impact of Assistants files, Im doing a lookup on a provided list of items in a json file I have uploaded.
The file has 6000 items in it, and the input query uses about 18000 tokens roughly,
If I halve the items in the file to 3000 , it still uses about 18000 tokens.
If I reduce the file to 1 item, and specifically ask for that one, it uses about 3000 tokens.
Have tried using json, docx, pdf, txt formats, all pretty much the same result.
Have also tried chunking the file into 7 chunks in the vector store with the same result.
There doesn’t seem to be a lot of correlation between the size and number of items in my file and the input tokens used, and 18000 tokens seems like a lot of tokens.
The issues obviously being cost, but also the request limit of 30k per minute gets hit pretty quickly.
I have also trying playing with the “Chunk size” and “Chunk overlap” although its not really clear what those do, so left as default of 800 and 400
Is what im seeing , about right?, or do I have something fundamentally wrong
The json is lots of rows like this, all up about 6000 “Items” rows in the file split amongst the families, my query is to find the identifier for the “term”, the “term” may not be exactly what the user enters, the assistant is asked to lookup the best matching term
{
“Family”: “AXXXX”,
“Items”: [
{
“Identifier”: “2326”,
“Term”: “xxxxxxxxxxxxxxxxxxx”
}
],
“Description”: “xxxxxxxxxxxx”
},