I’ve read through a few posts here and the API references, and it seems that there isn’t any way to filter / limit the file search of a vector store to any specific files or set of files, lets say year <= 2010, or file name with “Apple”. I would like to confirm if that’s the case, or I’m missing any functionality that could implement such feature, or any ways to wrap the file search to an external tool / code. I’m mainly trying to avoid look-ahead bias, but creating a separate vector store for each year or company-year seems to be both tedious and a waste. Any ideas is appreciated. Thanks!
You are correct. In Assistants, all files in a vector store, and even multiple vector stores when one is attached to an assistant and another is created as a thread attachment, are combined, and the top-ranked document chunks are returned from the entirety.
The documentation is fragmented so it can’t really refer to a whole file unless short; just knowledge sections of high relevance.
You could be extra-clever, and do something unattempted as far as I know: offer a tool to switch the document data that will be found in a file search, one where the AI is instructed to call it first to use the appropriate data.
With that you could capture the tool call, and then rebuild the existing vector store ID with new documents from your subset list, only returning the tool_call once extraction and inclusion is complete. This could only really work on a one-user assistant and vector store, as you can’t change anything else while a run is in progress. Then the next tool call from the AI might be an internal one to its file search.
Much easier to map out the kind of pattern you want to provide the AI as selective informationn, and just provide your own Chat Completions knowledge tools, or even have them powered through a user interface.
i store each file separately and then attach them dynamically to the thread as necessary.