After spending more than a year now with Gen AI, I feel RAG is more of a problem than a solution. it is so brittle and there is no science to it. I wanted to check with this group if anyone is aware of any other technique or ways to work more efficiently in this space. One of the things, I was expecting to see in this space is similar to how automated hyper parameter tuning happens in traditional ML. The idea is, if I provide a set of retrieval techniques and a loss function, is there any product/process that will run the different retrieval techniques, calculate the loss and automate the identification of the best technique. I could not find any such product so far.
@stevenic has done quite a bit, but he’s busy with a startup at the moment, I believe. If you search the forum, we’ve got a lot of great hidden gems / wisdom.
Hrm… I wonder if someone would want to create a massive “here’s all the best RAG posts” for the forum…
After a year of working on my RAG system, I could not disagree more heartily.
Here are a couple of information sources I have found useful:
- [2312.10997] Retrieval-Augmented Generation for Large Language Models: A Survey
No idea what you’re talking about. But if that is your use requirement, I can certainly see your concern.
I just want to make the point that I have found RAG to be extremely useful in my use case: Creating keyword and semantic searchable knowledgebases consisting of thousands of documents and focused on very specific areas of interest.
Biggest problem I’ve ran into so far: Some query responses not comprehensive enough. End-users can almost always get to a complete answer using chain-of-thought queries (few-shot). But, the end-users I’ve been working with want complete answers on the first question (zero-shot). This may touch on your issue.
My resolution: Deep Dive. Have the model dig through all the possible responses, categorize and analyze those, then return a complete list of best responses. Since I built my RAG system myself, I also have to build this feature. So I’m thinking, whatever you say this technique is you’re missing, you may have to build it yourself.
I have felt this way. A lot. But the truth is, so much of it is trial and error and fine-tuning the techniques that work best for your use case. And perhaps, in the end, that is the science of it all.
I find RAG extremely useful for giving the model more context. My use case is using LLM as a classifier on the text data, that not just uses the ML techniques, but “know” something about what these texts are about. E.g.I have a dataset with texts and associated numbers (encoded expense accounts for example). So, I receive the new text and I need to find the number for him. Classical classifiers showed accuracy 0.6-0.7. RAG been supplied with 5-10 most similar texts (based on embeddings+cosine similarity) with numbers, so the prompt is formed dynamically every time, and returns the numbers for new lines with accuracy 0.8-0.9.
I assume on top of classical classifier “reads” the texts and that gives him more understanding of what he’s returning vs. classical model.
So, my “technique” here is to tune the prompt every time i send it and monitor the accuracy afterwards.
Thanks a lot, are you using ada embedding. I found issues with embedding also where it is not returning the same embedding every time. I converted to encoding_mod =float, that also is not 100% consistent. I wanted to try with cohere and see if that is better than ada
Yes. But I use it via the Weaviate tex2vec-openai transformer. In my experience, I have received fairly consistent cosine similarity results.
I will say that RAG is indeed useful and in fact it’s the key to grounding the model and giving it memory. I’m assuming your task @joyasree78 is Q&A related and what you have to realize is that task is simply a compression problem.
The model will actually do a great job of answering almost any question but it needs to see the text with the answer in its context window. If you’re getting poor answers you’re likely not showing the model text that contains the answer.
I could go on and on about the flaws with current RAG techniques (I’m building a company to address them) but what I’d suggest is to look at the queries where the model came back with the wrong with a bad answer? Was the answer in the prompt text? The model sometimes misses things. It’s rare but they’re not perfect.
More often than not you’ll find that you’re simply not showing the model the text that contains the answer and this is when has to guess (hallucinate). The model always wants to answer you. That’s both it’s strength and it’s weakness because it’s a general purpose model that’s trying to cover a wide range of scenarios.