Someone should seriously take care of the documentation that is too lacking.
Example:
file_ids
array
Optional
A list of file IDs to add to the vector store. There can be a maximum of 10000 files in a vector store. Question: if I don’t attach an array of file_ids, are all the ones in the store used or are they all ignored?
I don’t see this information written anywhere.
And this is just a simple example, there are more important cases, for example streaming, a very complex topic with documentation that fits on a postage stamp.
While the reply above spotted the use of “10000”, and a link about 10000 that I wrote was provided in that reply, it is not answering the question.
An assistants’ vector store can be created without any documents, just to obtain its ID.
Then you add the document file IDs (files you’ve already uploaded, obtaining a file ID.)
“Array” being permitted in some calls means you can list many files all at once to be added, instead of making many API calls to add single IDs.
The vector store maintains all the files added, and you can add more, along with deleting by ID. So, once the vector store has the documents you want as assistant behavior and it is connected, you don’t have to continue referring to files.
Searches against the vector database search all the documents, and a vector store attached to an assistant and a separate vector store that can be added to a conversation thread are all combined into one search. Files for assistant behavior and files that you might allow a user to upload are co-mingled into the same single search function, where the AI cannot discriminate the uploader of the file, making it problematic (besides internal instructions that say “the user uploaded these files” despite them being an assistant’s skill).
If you are looking at API reference, obtaining just the parameter that was pasted, perhaps instead you want to click Documentation, which has more tutorial-like explaining. Then you can evaluate how this actually works and see if returning searched chunks of documents based on phrase similarity (and not the source) and adding them as tokens to a chat (at expense) is fit for any task.
No uploaded files are processed with document extraction or placed into a vector database if they aren’t specifically added to a vector store.
If am an API developer, and have one client with proprietary pricelists and troubleshooting database, I certainly wouldn’t want anything automatically added to another client’s assistant file search simply because I didn’t specify file IDs.