Compare 2 long texts using GPT

Hello,

How can I compare for long texts using GPT. The questions are of interest to the following: 1. What is new in this article compared to the articles previously marked as read, by points, 2. What of this article has already been in previously read articles, point by point. I guess that the solution may lie through a bundle of Langchain, Vector DB, and GPT, but it is not entirely clear how to formulate the above questions. I am glad to any ideas.

  1. I broke the text of the first article into pieces,
  2. I got embedding with the GPT API to them,
  3. Saved it in Pinecone,
  4. I sent requests to Pinecone and then, together with the results obtained, to the GPT API and received human-like responses.
1 Like

1- Outline the document into key points
2- Turn each key-points into embedding using text-embedding-ada-002 (working well for this task)
3- calculate the distance between points.
4- Find duplicated documents.
5- Find new key points.

I have no idea why people use langchain but you need text-embedding-ada-002

const CompareTwoPhrase = (array1, array2) => {
        const enListLength = Math.round(array1.length / 2)
        const correctList = []
        for (let i = 0; i < array1.length; i++) {
            for (let j = 0; j < array2.length; j++) {
                if (array1[i] === array2[j]) {
                    correctList.push(array1[i])
                }
            }
        }
        if (correctList.length >= 2) {
            return true
        } else {
            return false
        }
    }

/*
1. We're creating a new array called correctList.
2. We're looping through the first array and checking if the current element is in the second array.
3. If the current element is in the second array, we push it into the correctList array.
4. If the correctList array has more than two elements, we return true.
5. If the correctList array has less than two elements, we return false.
*/
1 Like

Thanks for the answer,
I would like to avoid comparing the new article by key points, and give the entire content of the new article for comparison.

Hi, Were you able to do this? Looking for a similar solution

I am looking for a similar solution. is there anyone who cracked it ??

1 Like

Idea: start with a classical line by line comparison. If a difference is found return the whole paragraph for further analysis by a LLM.

Hi,

You can use LangChain Agents for this. Create custom tools so that Tool 1 is a retriever for your first vector store and Tool 2 is a retriever for your second vector store. The LLM then compares the results.

1 Like