Text Similarity Models - Embedding API Query

sandeepkhomne · July 16, 2022, 7:29pm

Hi - I am using the Text Similarity Models - Embedding API to compute similarity score between two words. I am referring to the code snippet from this blog (copied below) - Introducing Text and Code Embeddings in the OpenAI API

The result is a “similarity score”, sometimes called cosine similarity between –1 and 1, where a higher number means more similarity.

The similarity score is in decimals and pretty close to each other. e.g.
Castle – Palace - the score is 0.88
Building – Palace - the score is 0.85
Laptop – Palace - the score is 0.78

Is there a way I get the score in percentage between 0 and 100% mapping to the similarity e.g.
Castle – Palace - the score should be around 95%
Building – Palace - the score should be around 50%
Laptop – Palace - the score should be around 10%

I tried looking in soft cosine, euclidean distance, etc. but still unable to find a good solution. Unfortunately, I don’t fully understand the math behind it.
Any help is greatly appreciated. Thanks!

import openai, numpy as np

resp = openai.Embedding.create(
    input=["feline friends go", "meow"],
    engine="text-similarity-davinci-001")

embedding_a = resp['data'][0]['embedding']
embedding_b = resp['data'][1]['embedding']

similarity_score = np.dot(embedding_a, embedding_b)

jhsmith12345 · July 17, 2022, 3:40am

Can you write some logic that turns the decimal into a percentage?

You just have to add 1 then divide by 2.

sandeepkhomne · July 17, 2022, 10:11am

The cosine similarity values are non linear and hence a direct conversion to percentage doesn’t yield correct result. E.g 0.8 and 0.7 values will result in 80% and 70% with this method. However, the similarity score is actually much wider… probably something like 80% and 20%.

Regards,
Sandeep Khomne

jhsmith12345 · July 17, 2022, 12:19pm

I see, thank you for educating me

pappachuck · July 18, 2022, 6:01am

It is more expensive and less effective that State of the art alternatives.
Lots of good studies everywhere.

sandeepkhomne · July 18, 2022, 8:41am

Thanks for the suggestion and the link. Will have a look and see if I uncover something.

Topic		Replies	Views
Embeddings and Cosine Similarity API	20	14028	February 25, 2024
Semantic search through embeddings API	3	1268	January 22, 2023
Embedding Results Scale Seems Off API embeddings , ada	8	4987	December 24, 2023
`text-embedding-ada-002` API	23	16807	February 6, 2024
Cosine similarity values and embeddings API embeddings	2	200	August 30, 2024

Text Similarity Models - Embedding API Query

Related topics