Mixing different knowledge RAG

ytyt · August 1, 2024, 9:51am

Hello everyone,

I’m using ChatGPT and Langchain to retrieve information from my Confluence database. Everything is working fine except when I ask a question about two different products. Here’s an example:

“Tell me the difference between ProductA and ProductB” → It will only answer by giving me information on one product.

I think this is because it only sends information from one page at a time, not more. Since the information on these two products is on two different pages, it doesn’t link the information from both pages.

Is there a way to fix this ?

Thanks !

ytyt · August 2, 2024, 1:48pm

Bump… I feel like this is a difficult thing to do, has anyone experienced something similar ?

jr.2509 · August 2, 2024, 2:29pm

Are you performing a single similarity search for both products or individual searches for each product?

ytyt · August 2, 2024, 4:22pm

Just a single search. Since the information for these two products is in different documents, it won’t be able to compare the two but will only read one of these pages or just say, 'Sorry, this information is not available

jr.2509 · August 2, 2024, 4:42pm

I personally think that this is likely the issue in your case. Try to execute two parallel searches, one for each product, then combine the returned information in your prompt for the purpose of the comparison.

Diet · August 2, 2024, 5:46pm

Embedding based naive RAG basically generates a single embedding coordinate based on your context.

If your context talks about two completely different things, your naive vector will likely be the average of the two - meaning that it it might be closer to a third thing than either of the originals.

e.g: (simplified) you want to compare the number 3 and the number 91, but your embedding vector would search for 47, finding neither 3 nor 91.

If you’re comparing multiple things you need to find a way to keep the vectors apart. With promptable (instruct) embedding models, you can simply do that by specifying which aspect of the prompt should be most relevant for the embedding (not available on the OpenAI platform atm)

One way to deal with this in the OAI ecosystem is as @jr.2509 mentioned, by providing a search tool and having it be called twice.

ytyt · August 6, 2024, 7:54am

Thank you for the detailed explanation! That gave me a clearer idea.

Topic		Replies	Views
Looking for help with example context and the answer results that I receive Prompting	3	1254	August 30, 2021
Question answering with extended number of chunks API embeddings , chatgpt , fine-tuning	13	2428	June 6, 2023
How to Add Knowledge Base in API API api	12	23340	December 15, 2023
Best method of injecting relatively large amount of context to be leveraged in a response API	10	12353	December 17, 2023
How to provide "context" in a Q&A chatbot Prompting	12	11920	December 20, 2023

Mixing different knowledge RAG

Related topics