Feature Request: Support Multiple Vector Stores per Run in Assistants API (Custom GPT Parity)

I’m trying to replicate the behaviour of Custom GPTs using the Assistants API v2 — specifically the ability to evaluate a user-uploaded file against a persistent knowledge base (e.g., compliance guidelines stored in a vector store).

At the moment, the API only supports a single vector store per run via the tool_resources.file_search.vector_store_ids field, even though the name suggests multiple stores should be possible. This limitation makes a very common use case unnecessarily complex:

“Evaluate this uploaded document against my existing knowledge base.”

In the current model, I’m forced to manually merge both the dynamic file (uploaded by the user) and the static reference (preloaded knowledge) into a temporary vector store every single time I run a thread — just to simulate what the ChatGPT interface already does natively.

Why this matters:

  • Custom GPTs implicitly support this dual-source approach: a persistent set of uploaded files and user-provided content at runtime
  • The Assistants API appears architecturally ready (separate tool_resources for Assistant and Thread), but lacks actual support for multiple vector stores per run
  • The vector_store_ids array hints that multiple stores were intended — yet validation currently blocks more than one

Request:

Please allow multiple vector stores to be used in a single run — ideally enabling one from the assistant (static knowledge) and one from the thread (dynamic uploads). This would:

  • Bring the API experience in line with the product experience
  • Simplify a huge number of real-world use cases (compliance checking, grading, auditing, translation verification, etc.)
  • Reduce unnecessary vector store churn and file duplication

Happy to clarify or test this when it’s on the roadmap. This one feature would dramatically reduce friction and expand what’s possible with Assistants.

Thanks!

  1. Assistants API will be deprecated before mid 2026. It is doubtful OpenAI will add this feature on a soon-obsolete API. The Responses API is meant to replace the Assistants API.
  2. The assistants API supports 2 vector stores in the same thread: the “general” VS attached to the assistant and the one attached to the thread. The assistants API is able to search BOTH these vector stores to answer the user’s request as if it was a unified VS.
    BUT: as it is RAG, and that the user request is rephrased/optimized for (supposably better) RAG results, sometimes it fails to find info from both VS and it looks like it searched only one VS. To circumvent that you can encourage (instructions) the assistant to divide the original query in subqueries or you can even use a function tool to do that and then send the subqueries yourself
  3. The Responses API, just like the Assistants API, also hints that “one day” more than 2 VS could be supported …
1 Like