Add files to existing vector store

Hi,
I want to add files to an existing vector store, instead of creating a new vector store each time.

Right now, as I understand from the documentation , the only way to add files to an existing vector store, is retrieving all the files_ids from the vector store, add the files I want to that list of files, and create a new vector store, with ‘beta.vector_stores.file_batches.create_and_poll’, which seems a long process and I will have to keep deleting old versions of the vector store I want to keep…

Any better solutions? Thank you!

https://platform.openai.com/docs/api-reference/vector-stores-file-batches

from openai import OpenAI
import time
client = OpenAI()

batch_add = client.beta.vector_stores.file_batches.create(
  vector_store_id="vs_abc123",
  file_ids=["file-abc123", "file-abc456"]
)
time.sleep(1)
print(batch_add.status)

And is there a reason why it should take more than an hour for 2 example text files with one character only? It has been ‘in_progress’ for that long

No, it’s not normal. Is it still in Progress?

Yes, and after retrieving it I get it has failed. I create the vector_store:

vector_store = client.beta.vector_stores.create(name="Prueba Archivos-1")
# vector_store: VectorStore(id='vs_abc', created_at=1716207115, file_counts=FileCounts(cancelled=0, completed=0, failed=0, in_progress=0, total=0), last_active_at=1716207115, metadata={}, name='Name', object='vector_store', status='completed', usage_bytes=0, expires_after=None, expires_at=None)

And after running your code I get:

client.beta.vector_stores.file_batches.retrieve(batch_id = batch_add.id, vector_store_id= "vs_abc")
# VectorStoreFileBatch(id='vsfb_123', created_at=1716206526, file_counts=FileCounts(cancelled=0, completed=0, failed=2, in_progress=0, total=2), object='vector_store.file_batch', status='failed', vector_store_id='vs_abc')

Okay the issue was that the files only had one caracter, so they were to short to be uploaded.

How lucky I am to see Alberto’s post about too short.

While using openai.beta.vectorStores.fileBatches.uploadAndPoll, I was frustrated that the status of the request was failed but I was seeing the files in the vector store.

My only question is how could you find, the file size was the reason?

1 Like

Hey! I’m happy to be helpfull.

I knew because I created files with just one or two characters, so I figured maybe it wasn’t enough.

1 Like

Thanks buddy.i looking for it .Great Help :heart:

exactly …same here wasted my 3 hr.lol thanks @alberto.tomas

hello Alberto. Just in case, have not come across a case of modifying already existing file batch. For example, there is already an existing file batch with 3 files. And i wanna add one more file to this existing file batch.

If I have all the files I want to attach to a vector store in my files area, is there a way to get a list of all those file_ids?

Struggling to move these files from the files area to the vector store I want them in, and I can only seem to get an upload to work when I grab each file ID individually and create an array of them.

You can list all the files with “assistants” purpose. There’s an API method for that, returning 10000 at a time.

However, a typical organization will have tons of unrelated files, even just attachments to user messages. Your application database must keep track of the file IDs that you upload and connect to vector store, by Assistant intention or purpose.

There’s no bulk uploader of files, which could be convenient. Sending all the files individually as required, in parallel, could be a speedup, but only because of bandwidth allotted to a single connection.

The API endpoint call also needs the correct purpose, “assistants” (or the undocumented user-data purpose).

Then, the data is scoped by project and project of the API key.

1 Like