Hi - I am using the Text Similarity Models - Embedding API to compute similarity score between two words. I am referring to the code snippet from this blog (copied below) - Introducing Text and Code Embeddings in the OpenAI API
The result is a “similarity score”, sometimes called cosine similarity between –1 and 1, where a higher number means more similarity.
The similarity score is in decimals and pretty close to each other. e.g.
Castle – Palace - the score is 0.88
Building – Palace - the score is 0.85
Laptop – Palace - the score is 0.78
Is there a way I get the score in percentage between 0 and 100% mapping to the similarity e.g.
Castle – Palace - the score should be around 95%
Building – Palace - the score should be around 50%
Laptop – Palace - the score should be around 10%
I tried looking in soft cosine, euclidean distance, etc. but still unable to find a good solution. Unfortunately, I don’t fully understand the math behind it.
Any help is greatly appreciated. Thanks!
import openai, numpy as np
resp = openai.Embedding.create(
input=["feline friends go", "meow"],
engine="text-similarity-davinci-001")
embedding_a = resp['data'][0]['embedding']
embedding_b = resp['data'][1]['embedding']
similarity_score = np.dot(embedding_a, embedding_b)