Our product uses the OpenAI API for a chat feature that queries prompts against a client’s set of data. The client data is organized into hundreds of files to provide a good level of granularity for reference citations.
The client may specify an ad hoc set of data to query against, which requires a new vector store each time the data set changes. Good performance is essential; delays of several minutes while creating a new vector store results in a poor user experience.
We have found that calling vectorStores.fileBatches.createAndPoll() generally has poor performance. Even when called with just a handful of files, latency is always at least 3-5 seconds and sometimes up to 2 minutes for no apparent reason.
On the other hand, when specifying a set of fileIds when calling threads.create(), performance is much better. I have called this with over 100 fileIds successfully in around 1 second.
Unfortunately, threads.create() does not seem to be reliable. There is a limit of 500 fileIds, and if I pass it a large number of larger files, many files fail, and sometimes the upload never finishes and the vector_store status remains ‘in_progress’ indefinitely. And there doesn’t seem to be a “poll” version of thread create that returns once the vector store is successfully created.
So here are my questions:
Why is the fileBatches.create() API so slow?
Is there a way to make threads.create() reliable?
Can files be added to the thread vector store incrementally while maintaining the performance advantage?
What, ultimately, is the preferred way of uploading a large number of files to a vector store in a performant manner?