Understanding AI Assistant input token counts

clive2 · June 12, 2024, 8:12pm

Im trying to understand the impact of Assistants files, Im doing a lookup on a provided list of items in a json file I have uploaded.

The file has 6000 items in it, and the input query uses about 18000 tokens roughly,

If I halve the items in the file to 3000 , it still uses about 18000 tokens.

If I reduce the file to 1 item, and specifically ask for that one, it uses about 3000 tokens.

Have tried using json, docx, pdf, txt formats, all pretty much the same result.
Have also tried chunking the file into 7 chunks in the vector store with the same result.

There doesn’t seem to be a lot of correlation between the size and number of items in my file and the input tokens used, and 18000 tokens seems like a lot of tokens.

The issues obviously being cost, but also the request limit of 30k per minute gets hit pretty quickly.
I have also trying playing with the “Chunk size” and “Chunk overlap” although its not really clear what those do, so left as default of 800 and 400

Is what im seeing , about right?, or do I have something fundamentally wrong

The json is lots of rows like this, all up about 6000 “Items” rows in the file split amongst the families, my query is to find the identifier for the “term”, the “term” may not be exactly what the user enters, the assistant is asked to lookup the best matching term

{
“Family”: “AXXXX”,
“Items”: [
{
“Identifier”: “2326”,
“Term”: “xxxxxxxxxxxxxxxxxxx”
}
],
“Description”: “xxxxxxxxxxxx”
},

nikunj · June 17, 2024, 5:10am

You should consider reducing the number of chunks using the max_num_results parameter to reduce token usage here: https://platform.openai.com/docs/assistants/tools/file-search/vector-stores

clive2 · June 18, 2024, 1:50am

Thanks, that does knock it down a lot…
OK, I think that makes sense. So it would read the data in that token size, hence the high token count.

Token overlap, I guess means, how many blocks can potentially be considered at the same time for the same query

ashraf.kaassamani24 · June 26, 2024, 7:16am

Hello, is there a tutorial on how i can do that, because the documentation is not that clear, how can i reduce the number of chunks using max_num_results?

clive2 · June 26, 2024, 5:22pm

I was only using the UI at this stage to set it,

But the API reference seems ok, you set it on the Vector Store, then add files to the Vector store

https://platform.openai.com/docs/api-reference/vector-stores/create

Topic		Replies	Views
File retrieval in assistant uses huge amount of input tokens API assistants-api	11	3027	June 12, 2024
Unexpected token counts of 800 however i have just created a run API api	3	164	December 13, 2024
Assistant API token Usage - Token usage more than the whole attached file Plus prompts API assistants-api , assistants-pricing	9	3021	March 20, 2024
Assistant API / costs / where do I find my token consumtions in assistants\|messages\|threads API	4	2058	December 14, 2023
Assistant's Retrieval Chunks in Playground: Can the Size be Controlled? API assistants	1	1425	November 18, 2023

Understanding AI Assistant input token counts

Related topics