Soft Cosine Measure vs Cosine Similarity

Hi Folks

Are the embeddings vectors designed to be used only with similarity matching by the Cosine Similarity and Euclidian Distance methods, or would the Soft Cosine Measure also be applicable?

Is there are blog or reference for all the mathematically relevant methods that can be used with the embeddings vectors?

Thanks

You can use any distance function which works for you. But since the vectors all have a normalized length, most common measures should be mostly equivalent.

2 Likes

The embeddings vectors are simply internal model-specific semantic relationships that are quite indescribable. Imagine a game of 20 questions, but increase the questions to 2000, and a continuous-scale question could be “looks like latin text” vs “looks like chinese text”, or “how much does it talk about space” - we really don’t know. They aren’t really designed for embeddings by themselves, and strange results, such as many more dimensions of larger models not necessarily giving better results.

Here’s documentation for sentence-transformers. It not just superficially shows the included cosine or dot product, but also different semantic methods one might use for particular and varied cases, like multi-sentence total match scores and such.

https://www.sbert.net/docs/usage/semantic_textual_similarity.html