[Error: 404 No file found with id 'file-RXt3Me3fAzy3ZEj9xy6ZLw' in vector store 'vs_68fe949dc6008191ade830d6f9e5935c'.]
Some extra error details: request_id: 'req_0a41b55322de4f5fbf4b7305b808ede6', error: { message: 'No file found with id \'file-RXt3Me3fAzy3ZEj9xy6ZLw\' in vector store \'vs_68fe949dc6008191ade830d6f9e5935c\'.', type: 'invalid_request_error', param: null, code: null }, code: null, param: null, type: 'invalid_request_error' }
However, when I check the vector store via the OpenAI API dashboard, the file exists and is successfully attached to the vector store.
I tried an alternate method of uploading a file to the vector store by doing this:
I can confirm, independent of the polling method the API SDK offers, that the vector store service is broken…again.
This script that is basically a “guaranteed to run” with its own newly-created vector store (and attaching itself as a file name):
openai.NotFoundError: Error code: 404 - {'error': {'message': "No file found with id 'file-DcBL8Kq6ZcRnGdnyDFYirr' in vector store 'vs_68fedd528f60819191cd84d21a5a2a59'.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
I had to add a 5 second sleep before vector_stores.files.retrieve() for success. The API SDK that checks immediately for polling will obviously not be adaptable.
We’re experiencing the same issue. We’re also seeing that the GET endpoint https://api.openai.com/v1/vector_stores/{vector_store_id}/files/{file_id}/content
is failing and consistently returning “Not found”. Meanwhile, the search endpoint is still working.
Error creating files: Error code: 404 - {'error': {'message': "No file found with id 'file-Uzx4KBALYwxHUwbLV8hoGJ' in vector store 'vs_68ff70740380819199a336c074652ac9'.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
Demonstration that will run on your filename to upload:from openai import OpenAI; cl = OpenAI() filename = “vs_meradate_demo.py” vs_id = cl.vector_stores.create(name=”vstest”).id def
@AndiL A note on this issue….I think a side affect of this is that the initial query of a vector store after adding files comes back with a file count of ZERO and a status of “completed” even though the vector creation process returned a file count of X (in my case 1) and “in_progress”.
If I poll the vector store until I actually get a file count, the status pops back to “in_progress” and I continue to pull until it now CORRECTLY shows a status of “completed”.
I would expect the initial retrieval of the vector store to have a correct file count and correct status of “in_progress”.
Would be great if this issue is fixed, but that is a workaround instead of waiting a random time and hoping the file is available.
We resolved this by adding a short delay between create and poll. This seemed to fix it, but out of paranoia we ALSO added a retry mechanism to give create as much time as it needs to resolve.
This solved our problem, for the first time since it mysteriously appeared a few days ago.
One can imagine the scenario happening here is a delay in database propagation in making the URL path of the network request for object retrieval by vector store container file id ready. It hits:
GET https://api.openai.com/v1/vector_stores/{vector_store_id}/files/{file_id}
A status 404 should be an expected or anticipated part of the lifecycle, along with knowing the futility of immediate polling.
Quite frankly, the SDK polling method is not in an appropriate place in code. Taking only an ID and a vector store, it doesn’t have knowledge about the size of the upload and expected latency of document extraction and embeddings of the chunks, nor does the method offer such intelligence parameters to be passed.
Giving OpenAI the benefit, they don’t describe using their simple polling function in file search documentation. The steps for file search are upload, add/attach, and check. However, the idea of listing all files with .vector_stores.files.list() as “check” for a single file attachment there is also poor, because the list method cannot return anything over 10000 files nor has a limit, and polling a huge list per-file would be silly.
However the “retrieval” (semantic search endpoint) documentation is indeed offering .vector_stores.files.upload_and_poll() for the same slot, now also proven poor. The name is also terrible, as the method doesn’t upload anything over the network.
The method that should be used is .vectorStores.files.retrieve()
The polling should have application-level consideration:
tolerate initial 404
have a file-size minimum delay expectation
have delay tuning by file type (such as PDF being more complex, but perhaps less tokens extracted)
have an aggressive polling during the expectation for user experience
have a back-off after failing to meet expectation
have a failure state timeout also extrapolated from the file type and size.
This post, from this issue going on a week, I whipped up a mod of my “real uploader” to tolerate an initial 404. You can port and add some brains:
Better would be:
No OpenAI SDK bloat for simple API methods
No OpenAI vector stores if you don’t want a pattern of downtimes and the cheapest “small” embeddings now
No file search tool if you don’t want injections saying “user uploaded files”