How to Find similarity between 2 sets of conversation

ritwikkanodia2 · June 8, 2023, 4:05am

I have a text conversation involving 2 people. I have another text conversation involving another 2 people. I want to find the similarity between the 2 conversations. How can I do this? How do I model this problem?

Foxalabs · June 8, 2023, 8:14am

Seems a task suited to LLM’s. Just give it one interaction (transcribed) and wrap that in some marker, let’s say ### first transcript ### and then the next in /// second transcript /// then ask the model to find similarities (“Now use the conversation in ### markers and compare it to the conversation in /// markers and tell me any similarities”). Assuming your transcripts fit within the models prompt context limitations… that should work fine.

ritwikkanodia2 · June 8, 2023, 9:06am

@Foxalabs thanks for the answer!

Conversation 1:
User 1 - hi
User 2 - hello! How are you?
User 1 - I am good

Conversation 2:
User A - What you had for lunch?
User B - Nothing just some pasta, you?
User A - Pizza!

Find similarity between conversation 1 and 2
Based on this example, what will be the input to the LLM model?

ritwikkanodia2 · June 8, 2023, 9:08am

Can each conversation be converted into a block of text, something like this? Will it be efficient?
Conversation 1:
“User 1 hi User 2 hello! How are you? User 1 I am good”
and then use for fine tunning and inference?

Foxalabs · June 8, 2023, 10:17am

The AI does not care much about the format of the text, so sure, you can turn it into a block, so long as the context remains intact.

As for fine tuning, you get the best results as question and answer pairs, i.e prompt and completion.

The best way to think of fine tuning is as teaching the AI new patterns, you show it examples of typical question and answers and it can then infer new ways of thinking and new information.

if you can format your data in that way then it will be useful.

jwatte · June 8, 2023, 11:52am

First, you have to define “similarity.”

For example, you can calculate an embedding, using something like text-ada-002, for each of the conversations, and then calculate the cosine distance (simple dot product) between the embeddings to estimate similarity.

But, that’s just one particular interpretation of “similarity.”

jtapiovaara · June 8, 2023, 12:21pm

I have done it so that I take two texts (conversations, chats, what have you) and pass them to API and prompting it to ‘capmare’ them. Then You can tune the prompt to do comparision scientifically or byt length or maybe any other mean you have. Testing what comes out helps a lot.

Python/Django at server side:

    doc_one = request.GET.get("comparedocsone", '')
    doc_two = request.GET.get("comparedocstwo", '')
    turbomode_messages = [{"role": "system", "content": ""}]
    turbomode_messages[0] = {
        "role": "system",
        "content": "You can compare two documents scientifically"
    }
    turbomode_messages.append(
        {
            "role": "user",
            "content": f"read document {doc_one[:8000]} and document {doc_two[:8000]}. Reply with an idea of a "
                       f"prompt to compare them with some scientific approach."
        }
    )
    reply = get_completion_chat(model_chat, turbomode_messages)

Topic		Replies	Views
Fine-tuning on conversations API	6	3339	December 14, 2023
How to evaluate chat conversations (not just question-answer pairs) GPT builders gpts	5	2872	February 15, 2024
Compare 2 long texts using GPT API	6	5169	August 15, 2023
How to the API to extract common questions from a long list of conversations API	4	1475	August 18, 2023
Train back and forth dialogues Prompting	14	2037	December 17, 2023

How to Find similarity between 2 sets of conversation

Related topics