Max 100 files in vector store

According to https://platform.openai.com/docs/assistants/whats-new, file_search supports up to 10k files.

However, when I try to create my vector store

my_file_ids = [ ... ] # 300 file IDs already uploaded to openAI
vector_store = client.beta.vector_stores.create(
    name="My Assistant V2", file_ids=my_file_ids
)

I get openai.BadRequestError: Error code: 400 - {‘error’: {‘message’: “Invalid ‘file_ids’: array too long. Expected an array with maximum length 100, but got an array with length 300 instead.”, ‘type’: ‘invalid_request_error’, ‘param’: ‘file_ids’, ‘code’: ‘array_above_max_length’}}

So how can file_search support 10k files if vector stores can only have 100 files? I thought the files must be in the vector store to be used?

Seems like the solution is to simply use multiple vector stores per assistant. In my case, I will need 3 vector stores with 100 files each.

Actually ignore that. It seems like if I try to add 3 vector stores, I get an error.

“Invalid ‘tool_resources.file_search.vector_store_ids’: array too long. Expected an array with maximum length 1, but got an array with length 3 instead.”

Yep, you discovered there’s one store per assistant.

If that’s not validating your API call input because of a max length, you might have to batch through attaching more files individually. Method does not accept an array:

from openai import OpenAI
client = OpenAI()

vector_store_file = client.beta.vector_stores.files.create(
  vector_store_id="vs_abc123",
  file_id="file-abc123"
)
print(vector_store_file)

See how many threads of that you can blast in parallel…

May I ask – why use these built-in vector stores when they are so limited compared to alternative solutions like Chroma or Pinecone? I am considering doing something similar to OP but am not sure what route to take

Thanks! I ended up using client.beta.vector_stores.file_batches.create_and_poll, which seems to take care of all that for me. Found it here: https://platform.openai.com/docs/assistants/tools/file-search/creating-vector-stores-and-adding-files

Not super intuitive but it works

1 Like

The 100 is just 100 for one call, not a total max amount per vector store. This is my store with 487 files.

To add more than 100 files, you can do OpenAI batches as beyonce mentioned, you can pass them one-by-one or pass 100 at a time.

2 Likes

As far as I can tell, there is no way to have more than 20 files in the assistant playground interface though, because 1) it seems that you can use only vector store at a time and 2) vector stores cannot be created with more than 20 files, or be modified above that limit. Unless I’ve missed something?

All of my files show up though it does seem buggy. If I load too many by clicking more a few times, my browser will freeze.

Using ChatGPT Custom GPTs, I could only upload 10 at a time, so there may be a UI limitation. I also couldn’t delete well through the web though it worked well via PowerShell/API.

The API now has a method to send multiple file IDs to a vector store besides the first create request.

https://platform.openai.com/docs/api-reference/vector-stores-file-batches/createBatch