File search and token usage (assistant)

Hi there.
I am trying to build an ai bot that knows a lot about a specific person and answers questions about him. i i have a 30 pages of data about the guy, divided to categories like basic details, family,childhood, personality etc…
Since reading all pages on each question is too slow and uses too much token, i need to implement a filter mechanism to fetch only the 2-3 texts where an answer to a question will most likely be found.
my questions are :

  1. Does assistant api file search automatically do it? If i provide it with 30 files and instructions, will it read only the relevant 2-3 texts?
  2. If it does not do it automatically on assistant file search, is there another solution to do that on openAI api?
  3. If no solution exist on openAI, i wonder whether i should use a vector DB like pinecone or use another prompt before the main question to fetch the right files based on their title (something like, give me the 3 titles corresponding to the text most likely to answer : <list of titles: family,childhood,social circle…>
1 Like

The quantity doesn’t matter after a certain point that prompt-stuffing becomes unreasonable.

  1. Yes
  2. OpenAI uses a vector database for their retrieval system. The main benefit of using an external one is better control at the expense of having more control

Large Language Models like GPT are extremely efficient in understanding context. You should be able to use the data with a natural conversation, and not need to kind of manipulate it

1 Like

I agree with @anon10827405 Your application should work brilliantly using the standard OpenAI assistants APIs. You just need to be careful about a couple of things:

  1. We’ve learned that (no surprise) getting the instructions right for these assistants makes all the difference. You need to be explicit about the need to search for the relevant content and perhaps how. And, of course, you need to explain what to do with what it finds. We struggled at first understanding that simply turning on the file_search tool is not enough.

  2. Remember that the assistants will break your content up into chunks (about 600 words each). The result of the file_search will be to return the most relevant chunks back into the context for further analysis. If these pages of yours are long, you need to think about the fact that the identity of the person, for example, might be at the top of the page, but some of that information will be at the bottom of the page, but this chunking will break that connection. You just need to structure those pages into small chunks that keep relevant content together – such as having the person’s name at the top of each section.

I hope that’s helpful. I bet you’ll be impressed with the answers you get.

Thanks.
do you think a pre prompt to choose which texts should i bring to the main prompt is a viable strategy to achieve the same without vector DB?

Thanks a lot.
So if i want it to use 3-4 chunks for each prompt, I just need to write in the in the assistant instructions something like “fetch the 4 most relevant chunks”?
Also, what about token usage? will it read all the files on every request?

Thanks in advance, could not find answers anywhere else :slight_smile:

First, I should say that I remain humble on this subject and I suspect others may be able to give you better guidance. I know that we had to mess around a lot with our instructions and settings until we got the behaviors we wanted. Note that in addition to instructions there are also various settings. One of particular importance is the minimum score of a search result to be included. Another sets the maximum number of chunks returned. But be very careful with these. While you might think you could just set that number to 4 in your case (or whatever), and a high score threshold, that is probably not the right thing to do. In my experience, the search scoring is far from perfect and it is better to err on the side of including more results and letting the AI sort out which of these (now in its context) it will use to answer the question. The defaults are liberal – including as many results as possible with a minimum score of 0. In our case, we changed the minimum to 0.4 but I can’t vouch for that as we haven’t studied the effect of it closely.

I recommend that you focus on the written instructions to the assistant. Read about how others are using prompt engineering – especially for assistants. And then be prepared to experiment with this until you are getting what you want. Near the top you’ll want something that says something like, “Search the uploaded documents for relevant content.” (In our case, our instructions are structured as a series of steps and this is one of those steps.)

Good luck!

1 Like