Yes I am aware of this. But it’s particular to the new model(s), and is atypical to the general math situation.
For example, in general, you should get values ranging from -1 to +1.
But yes, here it looks like 0 to 1 for the latest models. Haven’t fully tested it myself.
Also, in general, any two random uncorrelated things will have a dot product of 0 (orthogonal).
If correlated, a dot product closer to 1.
If anti-correlated, a dot product close to -1.
The angles are:
0 degrees, for dot product +1
90 degrees, for dot product 0
180 degrees, for dot product -1.
This is how the general math works out, and not all embedding engines these days have faithful mathematical properties, probably because of how they are generating the vectors. Plus, it may not be all that important if all you are doing is getting a relative list of top rankings.
Users need to be aware of these differences and peculiarities. Already folks are saying the new model is worse because they don’t get a bunch of correlations above 0.8 anymore. These folks are misleading themselves.
Instead, they should be worried about the relative ranking of the new model. So rank 1, 2, 3, 4, 5 vs. rank 1, 2, 3, 4, 5 between models, and which one makes more sense.
Here is why, from ChatGPT. The formula is general, but the example is in 3 dimensions, but it extends to arbitrary dimensions, like 3072, as well. Also, the magnitudes coming out of the embedding engine are all 1.0, so no need to compute this with the square root of the sum of the squares, it’s wasted computation here, since you are diving by 1.0 and this doesn’t change anything. If your vectors don’t always have length 1.0, then you should scale your dot products by dividing by the lengths of each vector, as shown in the formula.
Here is the graph of the inverse cosine. The input ranges over -1 to 1, output is from 0 to \pi radians, or 0 to 180 degrees.
The -1 to 1 input comes from your cosine similarity, or dot product. Most people do not find the angle in practice, because they are looking at relative rankings, and the inverse cosine is extra computation.
Also, fun fact, the new models are simply truncating and re-scaling the truncated vectors back to length 1.0. So you can do this on your own to carve out arbitrary dimensional embeddings from the new models. If you store the original 3072 vector, you can change the size of your vectors later, without re-embedding. This is useful if you want to reduce dimensions to speed up search, at the cost of some embedding accuracy.