Here is a notional chart when determining to use RAG, Fine-tune, or hybrid RAG+Fine-tune (source)
So, indeed, looks like a hybrid option might be the best to prevent hallucinations.
7 Likes
udm17
5
That chart is good and so helpful. Really makes explaining it to other much easier. Thanks for sharing this. Cheers !
2 Likes
_j
6
Here is a nonsensical chart
5 Likes
shamy
7
Thanks Paul. Temperature is set at 0. I have also given more stringent instructions not to answer outside context but still it does that.
When I pass conversation history, I am not passing the context it used to generate the answers - only the queries and responses. I am doubting that the model looks at it and thinks it hallucinated the past answers and hence starts to hallucinate. Do you think that is a possibility?
It’s not that as much that the previous responses have similar tokens and begin to form a pattern that the LLM sees if that makes sense. I would work on the prompt a bit more.
We’re doing this, focusing on RAG debugging at WhyHow.AI. We plug into any existing LLM/RAG system you have and focus on creating a knowledge graph based on the feedback that your tester/PM has on an output.
We’ve reduced hallucinations by 80%, debugged errors in seconds with only natural language, and reduced the time to send systems into production. Happy to see where we could be helpful!
shamy
10
Could you please suggest what we need to do with the prompt to avoid this?
You’d have to build something more sophisticated than a prompt unfortunately. Prompt injection can only take you so far in reducing hallucinations, and that’s something we’ve faced across multiple clients. (Also at some point the number of hallucinations you have to squash make putting more chat history in the context window unfeasible from a cost perspective + I think you’ve seen it doesn’t work too well)
I’m happy to hop on a call to better advise how to solve this - we basically started deploying a knowledge graph and LLM hybrid model. Jerry from Llamaindex has been aggressively promoting this method recently on twitter.
1 Like
What do you mean by this? I looked up Jerry’s posts on the web and found this. Is this what you mean?
Hey!
Not that one - that’s describing a more structured process for vector retrieval, which wouldn’t be a good process in your specific example.
Here’s a great list that Jerry came up with but it’s incomplete. We’ve built all of these for various purposes, but found the Custom Combo Query Engine to be super powerful and general (i.e. you can use logical deduction & prevent hallucinations & do multi-hop rag with the same engine).
For Shamy’s problem though, I think you wouldn’t need to go so far. Simply insert a knowledge graph as a post-processing step, insert the rule into the KG with natural language (so it takes literal seconds to set up) ‘X student curriculum is not Y’, and the LLM should output ‘I cannot answer your query’. It’s an RLHF model for debugging that works remarkably well, and isn’t in the list above because you’re not using the KG for full answer retrieval, you’re using the KG for full non-answer retrieval, which is an interesting nuance that reduces the amount of tech you need to set up.
That would be the best fix in my opinion.
2 Likes
What is your Knowledge Graph? Is it a vector database?
1 Like
Nope, we use a vector database as well as a Neo4j database in parallel. The company information that’s used is stored in the vector database. The rules we want enforced are stored in the Neo4j database. We use a system somewhat similar to ’The GenAI stack’ that Justin Cormack announced in DockerCon that links LangChain and Neo4j together. (I’d include a link but not allowed to post links here)
6 Likes
You also got a decission layer which queries this only when it can expect the right data?
Sorry to be pedantic but what does ‘this’ refer to? I assume you mean ‘this’ = ‘knowledge graph’ vs a vector database query engine?
The short answer is yes, but the exact structure should depend on what you’re looking to do - you can certainly set it up where it only queries the knowledge graph for problematic questions.
Nah just wanted to ask if you have that already. Thought you might need some help.
Nice visualisation though.
I maintain the chat history, but I don’t send it to the model (with the question) – instead, I use it to create a standalone question which I send to the model along with the context documents. I’ve not had any problems (ever) with hallucination in my RAG implementation. When the answer isn’t in the retrieved documents, the model (whether gpt-3.5, gpt-3.5-turbo or gpt-4) always responds as it it is instructed by the system prompt (which is also always sent with the standalone question).
3 Likes
Kobiel
20
Have you used any tool in your RAG implementation? I noticed, a chat model starts to hallucinate more often when it has more ‘power’. I also didn’t noticed any hallucinations with in a simple RAG app that only retrieve data from my documents, but when I added 1 more tool, the model sometimes ignore instructions and does whatever it wants
To their credit, I have not noticed it in gpt-4 or gpt-3.5-turbo-16k. I added a keyword tool to narrow down the context documents returned.
Neither of the Open AI models, so far, hallucinate. However, Anthropic Claude does go off script (in a good way):
You’ll note that several passages quoted are NOT in the returned context documents.
2 Likes
hey, when you say ‘I use it to create a standalone question’, what are you using when the context of the question may be in the history? Do you mean you have a list of ‘questions’ / intents that you maintain?
The standalone question, by definition, includes the context of the chat history.
If rumors about the upcoming Dev Conference are to be believed, we will soon be able to use the models themselves to maintain chat history.