I performed similar thing to what OpenAI wrote in their doc, when creating a vector store from multiple files using File Batch, as follows:
# Create a vector store caled "Financial Statements"
vector_store = client.beta.vector_stores.create(name="Financial Statements")
# Ready the files for upload to OpenAI
file_paths = ["edgar/goog-10k.pdf", "edgar/brka-10k.txt"]
file_streams = [open(path, "rb") for path in file_paths]
# Use the upload and poll SDK helper to upload the files, add them to the vector store,
# and poll the status of the file batch for completion.
file_batch = client.beta.vector_stores.file_batches.upload_and_poll(
vector_store_id=vector_store.id, files=file_streams
)
# You can print the status and the file counts of the batch to see the result of this operation.
print(file_batch.status)
print(file_batch.file_counts)
Turned out file_batch.status returned ‘failed’ status. But, when I checked to my OpenAI Dashboard, the files have uploaded well to the Storage, and have attached to the Vector Store as intended.
When I did client.beta.vector_stores.retrieve(vector_store_id=<VECTOR_STORE_ID>).file_counts, my files that I intended to upload had already been there. When I tried to chat with the Assistant using that Vector Store, it worked pretty well to answer the question based on the uploaded file.
But instead, the File Batch status returned failed status and the file_counts property had total zero file.
What did happen to the FileBatch? Anyone faced similar issue?
I’ve tried to perform the same thing using OpenAI’s front-end. Everything worked fine in the front-end, but actually the response of file_batches endpoint return the same thing
I’m also facing the same issue, the batched file status is failing but the assistant seems to be working fine even after that, I am using this on my live application, i just commented the status check condition for now, but I hope openai fixes this asap.
However, I was able to see the vector store I tried to create in the storage section on the OpenAI playground and create an assistant that could access and search the files I put in the vector store.
It seems that the bug is in reporting the file_batch status rather than in creating the vector store.
# Upload the batch
await openai_client.beta.vector_stores.file_batches.upload_and_poll(
vector_store_id=openai_vector_store.id, files=upload_file_paths
)
# List all files in the store. Results are paginated unfortunately.
vector_store_files = []
vector_store_file_list_response = await openai_client.beta.vector_stores.files.list(
vector_store_id=openai_vector_store.id
)
while vector_store_file_list_response.has_next_page():
vector_store_files.extend(vector_store_file_list_response.data)
vector_store_file_list_response = await vector_store_file_list_response.get_next_page()
# Use asyncio.gather to wait until the status of all individual files in the vector store is 'success'. Don't poll too fast if you have many files cuz there's another limit per minute for polling.
await asyncio.gather(
*[
openai_client.beta.vector_stores.files.poll(
file_id=vector_store_file.id, vector_store_id=openai_vector_store.id, poll_interval_ms=60*1000
)
for vector_store_file in vector_store_files
]
)