Is it possible to achieve embeddings cosine similarity approaching -1?

mayoinmotion · April 2, 2024, 11:37pm

The closest i’ve been able to achieve is -0.003850111554290606. Is perfect opposition impossible because all words share context and some semantic content? Is there research on this?

Diet · April 3, 2024, 12:52am

Welcome to the community!

I don’t know if that’s a meaningful result, I suspect it might be effectively 0. Could be a rounding thing.

I don’t think it’s possible with text-embedding-3, because it doesn’t look like it’s been trained for that.

If you look at ada-2, it had a minimum cosine similarity of around 0.6.

I suspect the reason why it doesn’t work is because it would be difficult to train. What would be a good -1 example, for example? It’s possible that this is more of a philosophical thing than a technology thing

mayoinmotion · April 6, 2024, 4:30pm

Seems like one way to have the embeddings cosine similarity approach -1 is to reduce the number of dimensions.

‘fish’ and ‘bicycle’ have -0.9776305952877404 embeddings cosine similarity in two dimensions.

mayoinmotion · April 6, 2024, 4:40pm

Where can I find more research on this? For example, fish and bicycle seem very, very dissimilar (almost perfect opposites) in two dimensions but merely dissimilar in 3072 dimensions (embedding-3-large).

curt.kennedy · April 6, 2024, 4:56pm

I made a post a while ago here:

This will make your embeddings more isotropic (more spread out) and get you more toward -1. But it requires post processing a large batch of previous embeddings.

But the what/how/why of what causing the bias in the embeddings get’s into potential biases in the hidden layers that are creating the embedding vectors.

So ultimately, the most practical solution is to adjust your thresholds for each embedding model you encounter, as they all seem pretty different, and do no conform to normal vector geometry expectations.

mayoinmotion · April 13, 2024, 9:17pm

I’m inspired by embeddings and thought it might be useful to describe them in a fun, accessible way to spark a broader dialogue. Who else agrees?

mayoinmotion · April 18, 2024, 5:46pm

Interesting post by “Mishtert T” discussing algebra of embeddings. Makes the point that queen-woman+man=king

When I tried this with 30 dimensions, there was cosign similarity of 0.5412786418597273. Does that seem low?

Topic		Replies	Views
Embedding Results Scale Seems Off API embeddings , ada	8	5357	December 24, 2023
Some findings on the meaning of embedding. Discuss with the example of "woman - man = queen - king = female" API	1	2160	April 22, 2023
Embeddings and Cosine Similarity API	20	15193	February 25, 2024
I don't know how to analyze the embedding API embeddings	8	1554	May 17, 2023
Expected Angular Differences in Embedding Random Text? API	9	1244	December 24, 2023

Is it possible to achieve embeddings cosine similarity approaching -1?

Related topics