Bug: Vector store status: completed does not guarantee searchability - file_search returns empty results silently

Bug: Vector store status: completed does not guarantee searchability - file_search returns empty results silently

Summary

After creating a vector store with vectorStores.create({ file_ids: […] }) and polling until status === “completed”, immediate file_search tool calls against the store return empty or sparse results. There is no error, no warning, and no way to distinguish “index not ready” from “document doesn’t contain this content." The index becomes fully queryable after an indeterminate delay (5+ seconds post-completion).

Environment

  • API: Responses API (responses.create) with tools: [{ type: “file_search” }]
  • Model: gpt-5.2
  • Document: ~2.1M character PDF (1,200+ pages), preprocessed into raw text with markers, uploaded via files.create then attached to vector store
  • Reasoning effort: tested at none/low/med (I initially thought it was a reasoning issue; higher reasoning led to better results because responses simply took longer, giving the vector store more time to spin up)

Reproduction steps

  1. Upload a large PDF via files.create({ purpose: “assistants” })
  2. Create a vector store: vectorStores.create({ name: “…”, file_ids: [fileId] })
  3. Poll vectorStores.retrieve(storeId) until status === “completed”
  4. Immediately start making responses.create calls with tools: [{ type: “file_search”, vector_store_ids: [storeId] }]
  5. Observe that early calls return responses with no retrieved content, while later calls (same store, same document, same prompt structure) return rich results

Evidence

I ran 8 generation passes over the same 11 comments (this pipeline answers peoples’ comments against a source document) against the same document. Each run creates a new vector store, polls to completion, then processes comments in batches of 5.

Input token counts directly measure how much content file_search returned (higher = more retrieved chunks). The system prompt + comment text alone is ~9-10k tokens.

Run: 1 (pre-existing store attached) (Expected Behavior)
Reasoning: low
Batch 1 comment 1: 77,916
Batch 1 comment 2: 76,582
Batch 1 comment 5: 93,939
Batch 2 (all): 25-43k
────────────────────────────────────────
Run: 2 (Bug Present :beetle: in runs 2-7)
Reasoning: low
Batch 1 comment 1: 10,902
Batch 1 comment 2: 9,535
Batch 1 comment 5: 110,223
Batch 2 (all): 25-43k
────────────────────────────────────────
Run: 3
Reasoning: low
Batch 1 comment 1: 9,578
Batch 1 comment 2: 10,962
Batch 1 comment 5: 72,879
Batch 2 (all): 17-53k
────────────────────────────────────────
Run: 4
Reasoning: none
Batch 1 comment 1: 9,448
Batch 1 comment 2: 10,298
Batch 1 comment 5: 9,722
Batch 2 (all): 17-26k
────────────────────────────────────────
Run: 6
Reasoning: none
Batch 1 comment 1: 9,471
Batch 1 comment 2: 9,585
Batch 1 comment 5: 10,289
Batch 2 (all): 25-26k
────────────────────────────────────────
Run: 7
Reasoning: none
Batch 1 comment 1: 10,911
Batch 1 comment 2: 9,630
Batch 1 comment 5: 10,289
Batch 2 (all): 25-26k

Key observations:

  • ~9-10k input = zero file_search content returned. The model receives only the system prompt and comment text. It searches the store, gets nothing back, and responds with “The administrative record does not contain…”, which is factually wrong; the content is in the store, it’s just not searchable yet.
  • Run 2 batch 1 shows the index coming online in real time: 10k → 10k → 60k → 27k → 110k across 5 sequentially-processed comments in the same batch.
  • Run 1 had a second, pre-existing vector store (created in a prior session) attached alongside the new one. That run had no retrieval failures because the pre-indexed store provided content immediately.
  • Batch 2 is always fine (~25-43k input) because by the time batch 1 finishes processing (~60-90 seconds), the index is “fully warm”.

Impact

  • Silent data loss. The model produces confident but content-free responses. There is no error or signal that retrieval failed, the file_search tool simply returns no results. The consumer cannot distinguish “index not ready” from “content not found.”
  • Non-deterministic output quality. Identical inputs produce dramatically different outputs depending on timing relative to index creation. This is invisible without inspecting token counts, and left me scratching my head for quite a while.
  • No workaround signal. There’s no file_counts.indexed field, no search-readiness endpoint, and no error on the file_search tool call. The only way to detect this is to monitor input token counts or do a test query, neither of which the API is designed to support.

Expected behavior

Either:

  1. status: “completed” should mean the index is queryable, don’t report completion until search works, OR
  2. Add a distinct status like “search_ready” that indicates queryability, OR
  3. Have the file_search tool return an error/warning when querying an index that isn’t fully propagated (e.g., “status”: “index_warming” in the tool result), so consumers can retry

Current workaround

I added a post-poll delay and a test probe query loop on my side, which has worked so far, but this is guesswork since there’s no API signal for when the index is actually ready.

Hey @morritse

Appreciate you flagging this. We’re going to take a closer look and dig into what might be causing it. We’ll report back as soon as we know more.

Facing a similar issue when using file_search in the Assistants API (semantic search on the retrieval API may also be affected). I’ve implemented a fallback semantic search function and added instructions to use this fallback function if file_search doesn’t find any results. The fallback function takes a query and uses OpenAI’s retrieval API to find relevant chunks from my vector store and include them in the current context.

Whenever file_search doesn’t find any results, the fallback function is called immediately but it also doesn’t find any relevant chunks. However, if I hit the retrieval API manually via curl, with the exact same query, configs and everything else, searching the same vector store, I get chunks (most of the time). The search fails approx once every 50 requests, and I haven’t been able to reliably reproduce the failure.

Seems like an issue with the vector stores availability, and not the file_search tool specifically.

Hi and welcome to the community!

This may be an edge case where polling a newly created vector store immediately returns unexpected results. A simple workaround is to wait a second or two before working with the new vector store.

Is that the issue you are seeing?

No I don’t think so. The vector store is not created immediately before the search, it exists permanently, and I just pass it’s id when creating a run. The files in the vector store are refreshed once daily by a scheduled task, with no obvious correlation between the time that the files are updated and the times that the semantic search queries fail.

I have the same issue. There is also another thread about the same. Your run same file search towards the same vector store and you will get some percentage of empty results. Is openai looking into this. We can reproduce this

While what you’re describing here might be an issue it’s not the same issue the post describes. VB is correct in his understanding of the issue but now there’s some ambiguity here. The issue is strictly when creating a new vector store. Once the vector store exists, I’ve never had a problem with weird retrievals.

I wouldn’t really call it an edge case, though. It happens every single time this flow is run and it’s so insidious I think there are probably tons of people experiencing this but without realizing it. For big documents I imagine it wouldn’t be that uncommon for users to create and immediately query a vector store. I am currently using a workaround where I have some spinup delay and status probe where I wait until the vector store size stops increasing, but this should really be baked into vector store creation/querying.