So now im making a rag app , and im so confused If I should use the OpenAI Assistant API or the old vector db + langchain for my app , I expect pdf files to be uploaded for my bot , Can someone tell me based on trial and error which is better and why
The advantage of using the assistants API is that you take a file, provide it to the assistant and can immediately start your RAG application.
The advantage of building your own RAG from the ground up is that you have control over the quality of the retrieval. You decide about the size of the chunks, how you are selecting the best matches based on the user query and how many results you provide to the model, in which order etc… you are also in control of the costs and can manage the number of times the database is queried and how many input tokens are supplied to the model.
This means there is no one size fits all answer and you should experiment with what matches your use case.
Hmm I see , can I hear you opinion about assistant api ? I’m not sure if I understood the pricing correctly but do I pay daily for already stored files?
From the docs:
If you enable retrieval for a specific Assistant, all the files attached will be automatically indexed and you will be charged the $0.20/GB per assistant per day.
But I believe the issue will be that if the model starts querying the knowledge files and adding additional context to the user query that you will pay a lot more for input tokens each turn o the conversation.
I recently had a conversation where things with the assistant retrieval got very costly very quick as the model added thousands of tokens to each turn of the conversation regardless.
So my best advice for using the assistants API is to focus your attention on both: the quality of the answers based on the retrieval and the overall costs for using the stock solution.
Then you can make an informed decision which way to go. If you are just entering this area then it’s generally good advice to build your own RAG anyways so that you get a better grip on what to expect at what costs.
More in this conversation:
Having now implemented a similarity threshold for my search results, my RAG search is even more efficient and the gap is even wider.
Consider Assistants API to be a beta product with some way to go I suspect … (but packaging up functionality in that way is a totally reasonable goal).
Assitants api has a file limit and can quickly not work if you have good amount of uploads
At the moment I prefer using a vector database myself, you have more control and cheaper in my POV. For instance, by using a vector database you can prefilter what you want it to look at, wheras that is difficult t with the assistants.
There’s probably a place for both.
I am using Assistants to process incoming emails, which have attachments that need to be evaluated as well. Being able to create a trhead - add the attachment and ‘go’ is great. The document is then uploaded somewhere else (or not, depending on the assistant outcome) - and can then be discarded in OpenAI as well.
I mostly have those type of ‘incoming’ non-static documents. So for that purpose the Assistant model is great.
One thing that is missing for me here;
Re-ranker. If OpenAI would create and give an option to use a cheap and fast reranker together with top-N limits for their RAG, I would not have to work on a personal implementation.