Bug: Vector store status: completed does not guarantee searchability - file_search returns empty results silently
Summary
After creating a vector store with vectorStores.create({ file_ids: […] }) and polling until status === “completed”, immediate file_search tool calls against the store return empty or sparse results. There is no error, no warning, and no way to distinguish “index not ready” from “document doesn’t contain this content." The index becomes fully queryable after an indeterminate delay (5+ seconds post-completion).
Environment
- API: Responses API (responses.create) with tools: [{ type: “file_search” }]
- Model: gpt-5.2
- Document: ~2.1M character PDF (1,200+ pages), preprocessed into raw text with markers, uploaded via files.create then attached to vector store
- Reasoning effort: tested at none/low/med (I initially thought it was a reasoning issue; higher reasoning led to better results because responses simply took longer, giving the vector store more time to spin up)
Reproduction steps
- Upload a large PDF via files.create({ purpose: “assistants” })
- Create a vector store: vectorStores.create({ name: “…”, file_ids: [fileId] })
- Poll vectorStores.retrieve(storeId) until status === “completed”
- Immediately start making responses.create calls with tools: [{ type: “file_search”, vector_store_ids: [storeId] }]
- Observe that early calls return responses with no retrieved content, while later calls (same store, same document, same prompt structure) return rich results
Evidence
I ran 8 generation passes over the same 11 comments (this pipeline answers peoples’ comments against a source document) against the same document. Each run creates a new vector store, polls to completion, then processes comments in batches of 5.
Input token counts directly measure how much content file_search returned (higher = more retrieved chunks). The system prompt + comment text alone is ~9-10k tokens.
Run: 1 (pre-existing store attached) (Expected Behavior)
Reasoning: low
Batch 1 comment 1: 77,916
Batch 1 comment 2: 76,582
Batch 1 comment 5: 93,939
Batch 2 (all): 25-43k
────────────────────────────────────────
Run: 2 (Bug Present
in runs 2-7)
Reasoning: low
Batch 1 comment 1: 10,902
Batch 1 comment 2: 9,535
Batch 1 comment 5: 110,223
Batch 2 (all): 25-43k
────────────────────────────────────────
Run: 3
Reasoning: low
Batch 1 comment 1: 9,578
Batch 1 comment 2: 10,962
Batch 1 comment 5: 72,879
Batch 2 (all): 17-53k
────────────────────────────────────────
Run: 4
Reasoning: none
Batch 1 comment 1: 9,448
Batch 1 comment 2: 10,298
Batch 1 comment 5: 9,722
Batch 2 (all): 17-26k
────────────────────────────────────────
Run: 6
Reasoning: none
Batch 1 comment 1: 9,471
Batch 1 comment 2: 9,585
Batch 1 comment 5: 10,289
Batch 2 (all): 25-26k
────────────────────────────────────────
Run: 7
Reasoning: none
Batch 1 comment 1: 10,911
Batch 1 comment 2: 9,630
Batch 1 comment 5: 10,289
Batch 2 (all): 25-26k
Key observations:
- ~9-10k input = zero file_search content returned. The model receives only the system prompt and comment text. It searches the store, gets nothing back, and responds with “The administrative record does not contain…”, which is factually wrong; the content is in the store, it’s just not searchable yet.
- Run 2 batch 1 shows the index coming online in real time: 10k → 10k → 60k → 27k → 110k across 5 sequentially-processed comments in the same batch.
- Run 1 had a second, pre-existing vector store (created in a prior session) attached alongside the new one. That run had no retrieval failures because the pre-indexed store provided content immediately.
- Batch 2 is always fine (~25-43k input) because by the time batch 1 finishes processing (~60-90 seconds), the index is “fully warm”.
Impact
- Silent data loss. The model produces confident but content-free responses. There is no error or signal that retrieval failed, the file_search tool simply returns no results. The consumer cannot distinguish “index not ready” from “content not found.”
- Non-deterministic output quality. Identical inputs produce dramatically different outputs depending on timing relative to index creation. This is invisible without inspecting token counts, and left me scratching my head for quite a while.
- No workaround signal. There’s no file_counts.indexed field, no search-readiness endpoint, and no error on the file_search tool call. The only way to detect this is to monitor input token counts or do a test query, neither of which the API is designed to support.
Expected behavior
Either:
- status: “completed” should mean the index is queryable, don’t report completion until search works, OR
- Add a distinct status like “search_ready” that indicates queryability, OR
- Have the file_search tool return an error/warning when querying an index that isn’t fully propagated (e.g., “status”: “index_warming” in the tool result), so consumers can retry
Current workaround
I added a post-poll delay and a test probe query loop on my side, which has worked so far, but this is guesswork since there’s no API signal for when the index is actually ready.