Ada-002 Embeddings Math and Cosine Similarity

Using Ada-002, I retrieved embeddings for the following:

v1=“hickory dickory dock”

I then created v5 by adding the elements of v2, v3, and v4.

I would expect the cosine similarity of v1 and v5 to be close to 1.0, but it was more like .0138.

Is my expectation valid?


I asked this when I was thinking in terms of downloading word embeddings and keeping them across projects. Turns out, to look to Ada-002 as if word embeddings is what it is good at is to miss the semantic richness of document embeddings. And besides, they are very cost effective, which is nice.

