Soft Cosine Measure vs Cosine Similarity

gidz · July 23, 2023, 2:20am

Hi Folks

Are the embeddings vectors designed to be used only with similarity matching by the Cosine Similarity and Euclidian Distance methods, or would the Soft Cosine Measure also be applicable?

Is there are blog or reference for all the mathematically relevant methods that can be used with the embeddings vectors?

Thanks

anon22939549 · July 23, 2023, 2:54am

You can use any distance function which works for you. But since the vectors all have a normalized length, most common measures should be mostly equivalent.

_j · July 23, 2023, 3:01am

The embeddings vectors are simply internal model-specific semantic relationships that are quite indescribable. Imagine a game of 20 questions, but increase the questions to 2000, and a continuous-scale question could be “looks like latin text” vs “looks like chinese text”, or “how much does it talk about space” - we really don’t know. They aren’t really designed for embeddings by themselves, and strange results, such as many more dimensions of larger models not necessarily giving better results.

Here’s documentation for sentence-transformers. It not just superficially shows the included cosine or dot product, but also different semantic methods one might use for particular and varied cases, like multi-sentence total match scores and such.

https://www.sbert.net/docs/usage/semantic_textual_similarity.html

Topic		Replies	Views
Embeddings and Cosine Similarity API	20	13673	February 25, 2024
Search vs Similarity API	2	1750	August 19, 2022
Semantic vs search embedding API	3	6533	September 28, 2023
Models: Embedding vs Similarity vs Search Models API api	4	2904	July 9, 2023
Cosine similarity values and embeddings API embeddings	2	143	August 30, 2024

Soft Cosine Measure vs Cosine Similarity

Related topics