How can I compare for long texts using GPT. The questions are of interest to the following: 1. What is new in this article compared to the articles previously marked as read, by points, 2. What of this article has already been in previously read articles, point by point. I guess that the solution may lie through a bundle of Langchain, Vector DB, and GPT, but it is not entirely clear how to formulate the above questions. I am glad to any ideas.
I broke the text of the first article into pieces,
I got embedding with the GPT API to them,
Saved it in Pinecone,
I sent requests to Pinecone and then, together with the results obtained, to the GPT API and received human-like responses.
1- Outline the document into key points
2- Turn each key-points into embedding using text-embedding-ada-002 (working well for this task)
3- calculate the distance between points.
4- Find duplicated documents.
5- Find new key points.
I have no idea why people use langchain but you need text-embedding-ada-002
const CompareTwoPhrase = (array1, array2) => {
const enListLength = Math.round(array1.length / 2)
const correctList = []
for (let i = 0; i < array1.length; i++) {
for (let j = 0; j < array2.length; j++) {
if (array1[i] === array2[j]) {
correctList.push(array1[i])
}
}
}
if (correctList.length >= 2) {
return true
} else {
return false
}
}
/*
1. We're creating a new array called correctList.
2. We're looping through the first array and checking if the current element is in the second array.
3. If the current element is in the second array, we push it into the correctList array.
4. If the correctList array has more than two elements, we return true.
5. If the correctList array has less than two elements, we return false.
*/
You can use LangChain Agents for this. Create custom tools so that Tool 1 is a retriever for your first vector store and Tool 2 is a retriever for your second vector store. The LLM then compares the results.