Conflicting info across documents

I have five documents that has information about a particular question. I convert these documents into embedding and then search when a users asks the question. now from these 5 results, I only want to create answer if the 5 documents dont contradict each other. In other words, if they contradict, I want to say that answer cant be created coz the information is contradictory. To do so probably I will need to pass those context and check if there is any contradiction before attempting to create an answer. Am I on the right track or there is a better way?

If you have an embedding vector database, and break documents into chunks the size where the AI can understand a few of them at a time, then you might not always get all the information required to conclude there is a contradiction when reading the whole of your knowledge base.

For example, load the Christian bible up as your database (and assume the AI doesn’t know about the contents otherwise). Ask a question about specific morals. You might not actually get all the parts from the document that fulfill a purpose of finding contradictions. The AI would only be able to answer or refuse based on the chunks that it obtained by doing semantic search.

If you improve the quality of the matching from user input with differing techniques, then you might have a better chance of including sections that have contradictions.

You could also try to engineer the database returns so only the best answers are returned, and the parts that could confuse an AI aren’t likely to be included.


The AI will try to synthesize the best output from the information. You’ll have to tell it to specifically compare each identifiable augmentation to others if you want a chance of the behavior you describe.

Understodd. In my case its very specific question as about to getting all the morals. For example, I may have 3 documents from 3 different dates. 1st says Acme integration uses API. 2nd saying Acme integration currently does not work with Google. 3rd saying Acme integrates with Google via FTP. So my assumption is that when I search embedding all 3 of them wd be returned. I understand that if there are 10 documents with such info, and I am only getting 5 then I might not get all the contradicgtory ones.

If you have documents from different dates, and the later ones may supersede the previous knowledge, that sounds like important information to include in your data chunks!

I wish the world were so simple. Unfortunately, documents are not dated.