Embedding Results Scale Seems Off

curt.kennedy · July 14, 2023, 2:42pm

The minimum cosine similarity of 0.7 from ada-002 is a “feature”.

You’d think it should have a min of -1, right? Well it doesn’t. The model isn’t isotropic, and has all correlations between 0.7 and 1. (Instead of ranging from -1 to 1).

You could batch process a set of embeddings, and transform them to process out this feature using PCA, but it could be more work than is necessary.

Instead, what I do, is calibrate my correlations, similar to a “0.7 -1” mapping.

Also tighten the limits of “closeness”. So instead of thinking everything within 0.1 is close, think more like 0.01 or even 0.001.

Topic		Replies	Views
Embeddings and Cosine Similarity API	20	13950	February 25, 2024
Question on text-embedding-ada-002 API	12	6301	December 24, 2023
Why cosine_similarity between embedding vectors is always above .68 API embeddings	6	3334	March 1, 2024
Quality of embeddings using davinci-001 embeddings model vs. ada-002 model API embeddings	15	3941	April 9, 2024
Semantic Textual Similarity - undifferentiated similarities API embeddings , semantic-search	5	1475	December 24, 2023

Embedding Results Scale Seems Off

Related topics