Compare 2 long texts using GPT

a15 · April 27, 2023, 2:07pm

Hello,

How can I compare for long texts using GPT. The questions are of interest to the following: 1. What is new in this article compared to the articles previously marked as read, by points, 2. What of this article has already been in previously read articles, point by point. I guess that the solution may lie through a bundle of Langchain, Vector DB, and GPT, but it is not entirely clear how to formulate the above questions. I am glad to any ideas.

I broke the text of the first article into pieces,
I got embedding with the GPT API to them,
Saved it in Pinecone,
I sent requests to Pinecone and then, together with the results obtained, to the GPT API and received human-like responses.

kevin6 · April 27, 2023, 2:26pm

1- Outline the document into key points
2- Turn each key-points into embedding using text-embedding-ada-002 (working well for this task)
3- calculate the distance between points.
4- Find duplicated documents.
5- Find new key points.

I have no idea why people use langchain but you need text-embedding-ada-002

const CompareTwoPhrase = (array1, array2) => {
        const enListLength = Math.round(array1.length / 2)
        const correctList = []
        for (let i = 0; i < array1.length; i++) {
            for (let j = 0; j < array2.length; j++) {
                if (array1[i] === array2[j]) {
                    correctList.push(array1[i])
                }
            }
        }
        if (correctList.length >= 2) {
            return true
        } else {
            return false
        }
    }

/*
1. We're creating a new array called correctList.
2. We're looping through the first array and checking if the current element is in the second array.
3. If the current element is in the second array, we push it into the correctList array.
4. If the correctList array has more than two elements, we return true.
5. If the correctList array has less than two elements, we return false.
*/

a15 · April 28, 2023, 8:35am

Thanks for the answer,
I would like to avoid comparing the new article by key points, and give the entire content of the new article for comparison.

naveedhakim3899 · June 27, 2023, 10:42am

Hi, Were you able to do this? Looking for a similar solution

nishaworking149 · August 1, 2023, 7:45am

I am looking for a similar solution. is there anyone who cracked it ??

vb · August 1, 2023, 7:51am

Idea: start with a classical line by line comparison. If a difference is found return the whole paragraph for further analysis by a LLM.

markgermaine91 · August 15, 2023, 1:27pm

Hi,

You can use LangChain Agents for this. Create custom tools so that Tool 1 is a retriever for your first vector store and Tool 2 is a retriever for your second vector store. The LLM then compares the results.

Topic		Replies	Views
Multi document comparision and Q/A API gpt-4 , chatgpt , langchain , token , comparison	10	13307	June 5, 2024
How to Find similarity between 2 sets of conversation Community api	6	4866	June 8, 2023
Can GPT-4 compare and list out contextual match data from 2 datasets? Prompting gpt-4 , chatgpt , data-preparation , prompt , prompt-design	5	6493	February 29, 2024
How to feed data for completions, instead of using prompt/answer fine-tuning format? API	25	17096	December 17, 2023
Ideal way to compare two solutions? API codex	3	1159	April 11, 2022

Compare 2 long texts using GPT

Related topics