Search only a specific file within an attached vector store

My users are uploading individual pdfs and asking some questions about their contents. At some point in the future, they may wish to compare two documents, or ask questions about the content of several documents.

This sounds like a good use case for a vector store.

However, is it possible to put all of my documents into a single vector store (as and when my users upload them), then attach it to an assistant, and direct the assistant to answer questions using a single file (or a subset of files) within that vector store? Or will it automatically search across the whole vector store? Obviously I’d have the file IDs (and file names).

Otherwise, I’d have to create a new vector store for each (arbitrary) file or set of files, which seems a waste of money!

1 Like

@jonathan.jeffery - Welcome to the community,

If users are uploading files during runtime, you can seamlessly handle this by allowing multiple file attachments within a single message. This enables you to create runs using the same assistant, ensuring that the files uploaded by users are incorporated. You can either update the assistant with the vector store id or the user message.

Check this link out for details. Lmk if I missed something.

https://platform.openai.com/docs/assistants/tools/file-search/step-2-upload-files-and-add-them-to-a-vector-store

file_batch = client.beta.vector_stores.file_batches.upload_and_poll(
  vector_store_id=vector_store.id, files=file_streams
)

You can upload your files onto a single vector store. it would perform similarity check across all documents and you cannot limit it to just one if you would want your design to have a common vector store.

Cheers!

1 Like

Welcome to the dev forum @jonathan.jeffery

According to the docs, here’s what happens when files are uploaded within messages:

You can also attach files as Message attachments on your thread. Doing so will create another vector_store associated with the thread, or, if there is already a vector store attached to this thread, attach the new files to the existing thread vector store. When you create a Run on this thread, the file search tool will query both the vector_store from your assistant and the vector_store on the thread.

2 Likes

I have used vector stores with many files. I name my files using a version of APA7 referencing: “Author Year Title.PDF” eg “Carroll A. (2016) - Carroll’s pyramid of CSR -Taking another look”. I have successfully targeted specific files by prompting eg “What does Carroll 2016 say about pyramid of CSR?”.

This approach is also handy for referencing work because the built in referencing (in playground at least) shows the file name, which makes it easy to reference in my reports.

image

I also think this feature would be extremely useful @jonathan.jeffery … It would be really cool if OpenAI would consider file folders / hierarchy when it comes to vector storage and being able to point/scope an assistant’s request to a specific folder.