Hello,
While creating a vector store with file_ids the file is marked as failed on the OpenAI web UI. The file uploads without issue but can’t be processed.
I’m creating the vector store like this:
remote_vs = self.client.beta.vector_stores.create(
name=self.vector_store.name,
file_ids=[file.id for file in self.vector_store.files],
chunking_strategy={
"type": "static",
"static": {
"max_chunk_size_tokens": 2048,
"chunk_overlap_tokens": 256,
},
},
)
logging.info("Remote Vector Store Data: %s", remote_vs)
It is logged:
Remote Vector Store Data: VectorStore(id=‘vs_47QQ6T2PoUzwIi7kjkgjyj2c’, created_at=1719528823, file_counts=FileCounts(cancelled=0, completed=0, failed=0, in_progress=1, total=1), last_active_at=1719528823, metadata={}, name=‘StableDiffusionData’, object=‘vector_store’, status=‘in_progress’, usage_bytes=0, expires_after=None, expires_at=None)
It is then subsequently marked failed in the WebUI when I go and check.
The file is a JSON file that is 87003153 bytes.
Also:
If I try using client.beta.vector_stores.files.create_and_poll() like this:
attached = self.client.beta.vector_stores.files.create_and_poll(
vector_store_id=self.vector_store.id,
file_id=file.id,
)
logging.info("Vector file attachment status: %s", attached)
It also fails but it gives me a reason at least.
This is logged:
Vector file attachment status: VectorStoreFile(id=‘file-MU2BzuNwxpTPsmSNBD4Rith2’, created_at=1719529404, last_error=LastError(code=‘invalid_file’, message=‘The file could not be parsed because it is too large.’), object=‘vector_store.file’, status=‘failed’, usage_bytes=0, vector_store_id=‘vs_zjv1pOXCfWeycOOhGoMktXYd’, chunking_strategy=ChunkingStrategyStatic(static=ChunkingStrategyStaticStatic(chunk_overlap_tokens=400, max_chunk_size_tokens=800), type=‘static’))
Is this file too big? Despite the fact that the documentation says 512MB with up to 5m tokens?