How to Find similarity between 2 sets of conversation

I have a text conversation involving 2 people. I have another text conversation involving another 2 people. I want to find the similarity between the 2 conversations. How can I do this? How do I model this problem?

Seems a task suited to LLM’s. Just give it one interaction (transcribed) and wrap that in some marker, let’s say ### first transcript ### and then the next in /// second transcript /// then ask the model to find similarities (“Now use the conversation in ### markers and compare it to the conversation in /// markers and tell me any similarities”). Assuming your transcripts fit within the models prompt context limitations… that should work fine.

@Foxabilo thanks for the answer!

Conversation 1:
User 1 - hi
User 2 - hello! How are you?
User 1 - I am good

Conversation 2:
User A - What you had for lunch?
User B - Nothing just some pasta, you?
User A - Pizza!

Find similarity between conversation 1 and 2
Based on this example, what will be the input to the LLM model?

Can each conversation be converted into a block of text, something like this? Will it be efficient?
Conversation 1:
“User 1 hi User 2 hello! How are you? User 1 I am good”
and then use for fine tunning and inference?

The AI does not care much about the format of the text, so sure, you can turn it into a block, so long as the context remains intact.

As for fine tuning, you get the best results as question and answer pairs, i.e prompt and completion.

The best way to think of fine tuning is as teaching the AI new patterns, you show it examples of typical question and answers and it can then infer new ways of thinking and new information.

if you can format your data in that way then it will be useful.

First, you have to define “similarity.”

For example, you can calculate an embedding, using something like text-ada-002, for each of the conversations, and then calculate the cosine distance (simple dot product) between the embeddings to estimate similarity.

But, that’s just one particular interpretation of “similarity.”

I have done it so that I take two texts (conversations, chats, what have you) and pass them to API and prompting it to ‘capmare’ them. Then You can tune the prompt to do comparision scientifically or byt length or maybe any other mean you have. Testing what comes out helps a lot.

Python/Django at server side:

    doc_one = request.GET.get("comparedocsone", '')
    doc_two = request.GET.get("comparedocstwo", '')
    turbomode_messages = [{"role": "system", "content": ""}]
    turbomode_messages[0] = {
        "role": "system",
        "content": "You can compare two documents scientifically"
    }
    turbomode_messages.append(
        {
            "role": "user",
            "content": f"read document {doc_one[:8000]} and document {doc_two[:8000]}. Reply with an idea of a "
                       f"prompt to compare them with some scientific approach."
        }
    )
    reply = get_completion_chat(model_chat, turbomode_messages)