Chat with one file in a multi file vector store or combine vector stores

It is really a deal breaker when you cannot pass in one file id and use that as your only knowledge base document. Creating multiple stores adding files then deleting them is a nightmare. IMHO you should be able to pass in multiple file ids and/or store ids so you can chat with the data in those documents. The Vector Store and File Search is a great idea but falls flat unless it is used for a group of static documents. Is there a work around where a Prompt has the magic words that say only use the data in this file or that file?

All items from an attached vector store are searched against when using Responses or Assistants.

In Assistants, if you have one-time files, it may be better to make them user message tool attachments, created alongside a thread message, and then they become part of a temporary vector store just for that thread. The vector store expires after seven days, but the uploaded files still need management.

There is another way you can go about this. That is to use the pay-per-use “vector store search endpoint”.

The facility that the standalone endpoint offers is “annotations”. You can add metadata to a file when adding it to a vector store. You can then specify queries using that metadata. You could have the file name itself as one metadata field you search on, thus ignoring others.

Read more: https://platform.openai.com/docs/api-reference/vector-stores/search

Then you can present that information via chat messages automatically, or by giving the AI a tool to call upon.

1 Like

Thanks a lot for this detailed explanation — it’s very helpful!

I wanted to ask a quick follow-up:
If I want to use metadata (like file name) to restrict a search to just one specific file within a vector store, could you share a practical example of how the query would look using the search endpoint?

Also, if I’m using the Assistant UI (not the API), is there any way to replicate this behavior, or is that only possible via API calls?

Appreciate your help!

Sure. First you set the metadata. You can be creative, like adding the filename, upload date, customer id, topic, whatever. This call does it after-the-fact on a file in a vector store.

def add_file_meta(id, file_id, key="filemeta1", value="filevalue1") -> dict:
    """
    add a single additional "attribute" key to a file in a vector store
    args: vector store ID, file ID, the key name, the value to set
    """
    import os, httpx
    api_key = os.environ.get("OPENAI_API_KEY")
    if not api_key:
        raise EnvironmentError("OPENAI_API_KEY environment variable is not set")
    url = f"https://api.openai.com/v1/vector_stores/{id}/files/{file_id}"
    headers = {"Authorization": f"Bearer {api_key}"}
    body = {"attributes": {key: value}}
    with httpx.Client(timeout=10.0) as client:
        response = client.post(url, headers=headers, json=body)
        response.raise_for_status()
    return response.json()

And then to get search results, implement a function with the “filters” I’ve commented out and the pattern you’d use them.

def search_vs(id, query="a placeholder text", max=1) -> dict:
    """
    use the search endpoint for a query, get "max" results.
    """
    import os, httpx
    api_key = os.environ.get("OPENAI_API_KEY")
    if not api_key:
        raise EnvironmentError("Search: OPENAI_API_KEY not set")
    url = f"https://api.openai.com/v1/vector_stores/{id}/search"
    headers = {"Authorization": f"Bearer {api_key}"}
    body = {
        "query": query,
        "max_num_results": max,
        #"filters": {
        #    "key": "filetype",
        #    "type": "eq",
        #    "value": "support",
        #}
    }
    with httpx.Client(timeout=20.0) as client:
        response = client.post(url, headers=headers, json=body)
        response.raise_for_status()
    return response.json()

“eq” is equals. For integers, such as sizes or epoch timestamps, you can use other searches such as greater or less than.

2 Likes

Thank you very much for your detailed explanation and help! I really appreciate your time and effort.

1 Like