File retrieval in assistant uses huge amount of input tokens

_j · May 3, 2024, 7:52pm

The data will not be chunked by “logical JSON snippet”. It will be chunked by token counts of documents.

Your input cost with limitless top data can be 20 x (800) or even 20 x (1200 or 1600) - depending on how their overlap documented is interpreted. The only constraint would be the probability of document end “tails” there are with less than 800, if they don’t simply ensure the last chunk is also 800 from end.

You could do some clever preprocessing if performing embeddings on the initial single JSON yourself. After obtaining embeddings (perhaps 2B tokens?) they could then be ranked in 1D space by distances from tasks or whatever amount of nearly free iterative computations you can do overnight on a workstation to sort. To then cluster yourself for highly focused sections.

Of course, if paying for embeddings once on individual items with embeddings targeting the actual data, why would you continue to pay daily for GB of vector database that is (DATA * (>150%) + 1Kvector) * chunks which is confused by literal mixed messaging?

Topic		Replies	Views
File Search pricing (retreive the docs info) API pricing	4	1562	June 5, 2024
Understanding AI Assistant input token counts Prompting gpt-4 , lost-user , assistants-api	5	3374	June 26, 2024
New "Assistants" API a potential replacement for low level "RAG" style content generation? API	9	8568	March 4, 2024
Understanding the current Assistant Retrieval process API assistants	7	13732	November 20, 2023
The OpenAI console Assistant does not use or find some of the files uploaded in its file search zone API	5	320	October 10, 2024

File retrieval in assistant uses huge amount of input tokens

Related topics