Efficient way to use dynamic file_ids per prompt (without creating a new untitled vector store every time)?

orfeasfil2000 · May 27, 2025, 5:49pm

Hello everyone!

I’m building an LLM-based application where I already have a custom search engine that returns a list of relevant file IDs based on the user’s query.

What I’d like to do is:

Upload those files to OpenAI, and have the assistant answer only based on that exact list of file IDs — no general retrieval, no full index, just scoped to those specific files.

These files can change dynamically for each query, depending on what my search engine finds.

What I’ve tried:

I explored file_search and vector_store tools in the Assistants API.

However:

The file_ids argument is no longer supported.
Using the attachments + file_search method creates a new unnamed vector store every time I ask a question — like this:

await client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content=question,
    attachments=[
        {
            "file_id": file_id,
            "tools": [{"type": "file_search"}]
        }
    ]
)

While this works, it silently creates a new vector store per request, which I think is:

Inefficient
Difficult to track or clean up
Wasteful at scale

My Question:

Is there a clean way to scope assistant responses to a dynamic set of file_ids per request, without:

Creating a new vector store each time?
Embedding everything manually into the prompt?

Ideally, I’d like to:

Reuse previously uploaded files by their file_ids
Scope the assistant’s answer strictly to those files
Avoid any redundant indexing or hidden resource creation

Any insights, workarounds, or best practices?

Thanks in advance!

_j · May 27, 2025, 6:01pm

Not built-in tools

All items from an attached vector store are searched against when using Responses or Assistants built-in tool, with no additional filtering parameter available.

But a dedicated search endpoint

There is another way you can go about this. That is to use the “vector store search endpoint”, pay-per-use at the same rate as Responses tools calls.

The facility that the standalone endpoint offers is “annotations”. You can add metadata to a file when adding it to a vector store, or you can update a file unit in a vector store. You might add a file’s own ID, or your own database key that you are tracking.

You can then specify filters using that metadata, to filter out or only allow certain files. You could construct a whole bunch of “or” statements until you find out if there is a limit on the length of query parameters.

Read more: https://platform.openai.com/docs/api-reference/vector-stores/search

To utilize ranked chunked results in your own way

Then you can present that information via chat messages automatically, or by giving the AI a developer function to call upon.

An advantage is that your application is not damaged by OpenAI jamming up the context with messages “the user has uploaded files” that come along with file_search.

orfeasfil2000 · May 27, 2025, 6:04pm

Wow, thanks for your rapid response!

Yep this is a solution i thought (the vector store with metadata). The problem is that each vector store has a file number limit of 10k. I want to upload 100k. Maybe i can force the promtp request to search to multiple vector stores with the specific metadata i want.

Moreover is it necessary to use threads in this case?

_j · May 27, 2025, 6:30pm

Threads is part of Assistants. previous_response_id is part of Responses endpoint if you opt for server-side chat state. They only refer to how you maintain a user’s chatbot chat history, and don’t inform data retrieval techniques (except by the inconvenience of persisting what should expire).

If you continue to find ways that OpenAI’s vector store doesn’t fit your needs (apart from not billing upfront for the cost of running embeddings on documents, and doing naive extraction and splitting for you), you might want to investigate your own vector store solution.

Getting a local query from an embeddings model and running cosine similarity with TensorFlow can give you rankings and top-k faster than a remote API call. Then, considering the massive bulk you might be searching against with little semantic differentiation, you can run a high-quality reranker on top results.

Then you can think about your own partitioning techniques.

You want to keep language generation models out of the equation, because they are magnitudes more expensive and slower.

Topic		Replies	Views
Chat with one file in a multi file vector store or combine vector stores API	4	279	May 17, 2025
Improving File Search specificity w/ Assistant for accurate document processing API assistants-api , file-uploads	3	1475	December 3, 2024
The OpenAI console Assistant does not use or find some of the files uploaded in its file search zone API	5	552	October 10, 2024
Help with Vector_storage Attach in THREADS: Separating Data Effectively API threads , assistants-api , assistants-files , vector-store	4	243	May 24, 2025
Preventing assistant from mentioning files API assistants-api	3	285	April 3, 2025