It seems like assistants negate the need to build one’s own Retrieval-Augmented Generation system, but with a file limit of 20 files per Assistant, it seems like we are not ready yet to replace a solution that uses, let us say, thousands of files as documentation for the AI.
I have been thinking about keeping around my own vector index with the large file collection and when the query does return any file_ids, attach them corresponding to the thread, and then for each user message make the same checks again and attach and detach files as needed. That all seems very inefficient and highly likely slow.
Is there a strategy that is meant to be used?
I would appreciate any guidance here.
P.S. in any case, without stream support, assistants are not a good user experience yet because a loading gif will always be less appreciated than the partial text being generated.
I second this opinion. I was thinking the same thing.
At first I thought this was a per organisation limit, but later discovered that “organisation” still feels broken.
The 100GB limit is somewhat okay - but usually in my case - i prefer using multiple small files - so the file number limit is the problem. Otherwise, Assistants will already completely replace what I’ve built so far ( with VectorDB etc. ).
Agree with this, but as far as I understand there’s no file number limit to upload with client.file.create, only a total size limit.
Looking at the API, the number of file limits apply to the assistant and to the message, 20 and 10.
So suppose you have 100 files in your client.files index, if there was an API functionality to get the top 20 or 10 given the prompt, I would be ok with that.