I’m using ChatGPT and Langchain to retrieve information from my Confluence database. Everything is working fine except when I ask a question about two different products. Here’s an example:
“Tell me the difference between ProductA and ProductB” → It will only answer by giving me information on one product.
I think this is because it only sends information from one page at a time, not more. Since the information on these two products is on two different pages, it doesn’t link the information from both pages.
Just a single search. Since the information for these two products is in different documents, it won’t be able to compare the two but will only read one of these pages or just say, 'Sorry, this information is not available
I personally think that this is likely the issue in your case. Try to execute two parallel searches, one for each product, then combine the returned information in your prompt for the purpose of the comparison.
Embedding based naive RAG basically generates a single embedding coordinate based on your context.
If your context talks about two completely different things, your naive vector will likely be the average of the two - meaning that it it might be closer to a third thing than either of the originals.
e.g: (simplified) you want to compare the number 3 and the number 91, but your embedding vector would search for 47, finding neither 3 nor 91.
If you’re comparing multiple things you need to find a way to keep the vectors apart. With promptable (instruct) embedding models, you can simply do that by specifying which aspect of the prompt should be most relevant for the embedding (not available on the OpenAI platform atm)
One way to deal with this in the OAI ecosystem is as @jr.2509 mentioned, by providing a search tool and having it be called twice.