How to refine the result from file search

I’m using assistant api to extract precise information from the files in the vector store. On every new run, keeping the same params + temperature is set at 0.2, the assistant outputs new results and some of them are not very consistent with what’s in the files, plus each time the result is not what i desired to be even though i did added additional instructions in the run. it seems that the assistant doesnt go through the whole files. For record, I added 10+ files in the vector store, chunking strategy is at max values.

What do you suggest to make the assistant responds with accurate and consistent results?

top_p: 0

Maybe then the AI will write queries that have more similar text between. You’d also have to ensure the entire chat is entirely identical if you are wanting to not get different things written as queries to the file search and are then asking what the AI sent as a search term to find out.

The vector store should be relatively deterministic on what is already set in stone in the vector store on chunks, but the embeddings model which is run again on every query (unless they use some string hashing caching to save a fraction of a penny), will still return varying values for each dimension between runs.

Overall, the assistant using vector stores cannot be consistent or of ultimate quality, simply because obtaining semantic similarity score of what the AI writes, like “ostrich farming economics” against the vector store of extracted documents will have little actual similarity with the form and nature of large chunks of text to be able to rank them well.

even when setting top_p = 0 the file search is still not extracting everything I’m looking for, even worse it’s getting lesser results than desired.
What should be the best solution for this? if attaching vector store to the assistant isnt very efficient. i cant attach all the files to the thread because i have more than 10 files.

“attaching” still just adds files to the same vector store.

You can also play with how much data is returned, in chunks, or a new parameter to limit the total.

Whether: to get a whole bunch of chunks so much more from files is presented (making it expensive), or limiting the chunks so AI is focused on just the top results; neither is a proven winning strategy.

Depending on the data you have in documents, re-embedding with a different chunk size may give you more specificity with smaller pieces, or you might lose the context of what is being returned.