File search and token usage (assistant)

homericentertainment · February 10, 2025, 8:57pm

Hi there.
I am trying to build an ai bot that knows a lot about a specific person and answers questions about him. i i have a 30 pages of data about the guy, divided to categories like basic details, family,childhood, personality etc…
Since reading all pages on each question is too slow and uses too much token, i need to implement a filter mechanism to fetch only the 2-3 texts where an answer to a question will most likely be found.
my questions are :

Does assistant api file search automatically do it? If i provide it with 30 files and instructions, will it read only the relevant 2-3 texts?
If it does not do it automatically on assistant file search, is there another solution to do that on openAI api?
If no solution exist on openAI, i wonder whether i should use a vector DB like pinecone or use another prompt before the main question to fetch the right files based on their title (something like, give me the 3 titles corresponding to the text most likely to answer : <list of titles: family,childhood,social circle…>

anon10827405 · February 10, 2025, 9:29pm

The quantity doesn’t matter after a certain point that prompt-stuffing becomes unreasonable.

Yes
–
OpenAI uses a vector database for their retrieval system. The main benefit of using an external one is better control at the expense of having more control

Large Language Models like GPT are extremely efficient in understanding context. You should be able to use the data with a natural conversation, and not need to kind of manipulate it

kduffie · February 10, 2025, 10:03pm

I agree with @anon10827405 Your application should work brilliantly using the standard OpenAI assistants APIs. You just need to be careful about a couple of things:

We’ve learned that (no surprise) getting the instructions right for these assistants makes all the difference. You need to be explicit about the need to search for the relevant content and perhaps how. And, of course, you need to explain what to do with what it finds. We struggled at first understanding that simply turning on the file_search tool is not enough.
Remember that the assistants will break your content up into chunks (about 600 words each). The result of the file_search will be to return the most relevant chunks back into the context for further analysis. If these pages of yours are long, you need to think about the fact that the identity of the person, for example, might be at the top of the page, but some of that information will be at the bottom of the page, but this chunking will break that connection. You just need to structure those pages into small chunks that keep relevant content together – such as having the person’s name at the top of each section.

I hope that’s helpful. I bet you’ll be impressed with the answers you get.

homericentertainment · February 11, 2025, 11:17am

Thanks.
do you think a pre prompt to choose which texts should i bring to the main prompt is a viable strategy to achieve the same without vector DB?

homericentertainment · February 11, 2025, 11:18am

Thanks a lot.
So if i want it to use 3-4 chunks for each prompt, I just need to write in the in the assistant instructions something like “fetch the 4 most relevant chunks”?
Also, what about token usage? will it read all the files on every request?

Thanks in advance, could not find answers anywhere else

kduffie · February 11, 2025, 4:28pm

First, I should say that I remain humble on this subject and I suspect others may be able to give you better guidance. I know that we had to mess around a lot with our instructions and settings until we got the behaviors we wanted. Note that in addition to instructions there are also various settings. One of particular importance is the minimum score of a search result to be included. Another sets the maximum number of chunks returned. But be very careful with these. While you might think you could just set that number to 4 in your case (or whatever), and a high score threshold, that is probably not the right thing to do. In my experience, the search scoring is far from perfect and it is better to err on the side of including more results and letting the AI sort out which of these (now in its context) it will use to answer the question. The defaults are liberal – including as many results as possible with a minimum score of 0. In our case, we changed the minimum to 0.4 but I can’t vouch for that as we haven’t studied the effect of it closely.

I recommend that you focus on the written instructions to the assistant. Read about how others are using prompt engineering – especially for assistants. And then be prepared to experiment with this until you are getting what you want. Near the top you’ll want something that says something like, “Search the uploaded documents for relevant content.” (In our case, our instructions are structured as a series of steps and this is one of those steps.)

Good luck!

Topic		Replies	Views
The OpenAI console Assistant does not use or find some of the files uploaded in its file search zone API	5	359	October 10, 2024
Large document - Inject into API or use knowledge base with semantic search? Prompting gpt-4 , api	6	398	May 16, 2024
Prompt returns answers from only one file in a vector store API gpt-4 , api , vector-store	3	228	September 11, 2024
Dear all, I have the questions about the openAI assistant chunk size and chunk overlap of file search tool? API	6	2056	June 21, 2024
Seeking Advice on Reducing Costs for RAG Chatbot Using File Search Assistant API api	4	1064	July 6, 2024

File search and token usage (assistant)

Related topics