Understanding AI Assistant input token counts

Im trying to understand the impact of Assistants files, Im doing a lookup on a provided list of items in a json file I have uploaded.

The file has 6000 items in it, and the input query uses about 18000 tokens roughly,

If I halve the items in the file to 3000 , it still uses about 18000 tokens.

If I reduce the file to 1 item, and specifically ask for that one, it uses about 3000 tokens.

Have tried using json, docx, pdf, txt formats, all pretty much the same result.
Have also tried chunking the file into 7 chunks in the vector store with the same result.

There doesn’t seem to be a lot of correlation between the size and number of items in my file and the input tokens used, and 18000 tokens seems like a lot of tokens.

The issues obviously being cost, but also the request limit of 30k per minute gets hit pretty quickly.
I have also trying playing with the “Chunk size” and “Chunk overlap” although its not really clear what those do, so left as default of 800 and 400

Is what im seeing , about right?, or do I have something fundamentally wrong

The json is lots of rows like this, all up about 6000 “Items” rows in the file split amongst the families, my query is to find the identifier for the “term”, the “term” may not be exactly what the user enters, the assistant is asked to lookup the best matching term

{
“Family”: “AXXXX”,
“Items”: [
{
“Identifier”: “2326”,
“Term”: “xxxxxxxxxxxxxxxxxxx”
}
],
“Description”: “xxxxxxxxxxxx”
},

You should consider reducing the number of chunks using the max_num_results parameter to reduce token usage here: https://platform.openai.com/docs/assistants/tools/file-search/vector-stores

3 Likes

Thanks, that does knock it down a lot…
OK, I think that makes sense. So it would read the data in that token size, hence the high token count.

Token overlap, I guess means, how many blocks can potentially be considered at the same time for the same query

Hello, is there a tutorial on how i can do that, because the documentation is not that clear, how can i reduce the number of chunks using max_num_results?

I was only using the UI at this stage to set it,

But the API reference seems ok, you set it on the Vector Store, then add files to the Vector store

https://platform.openai.com/docs/api-reference/vector-stores/create