Using Ada-002, I retrieved embeddings for the following:
v1=“hickory dickory dock”
v2=“hickory”
v3=“dickory”
v4=“dock”
I then created v5 by adding the elements of v2, v3, and v4.
I would expect the cosine similarity of v1 and v5 to be close to 1.0, but it was more like .0138.
Is my expectation valid?
2 Likes
I asked this when I was thinking in terms of downloading word embeddings and keeping them across projects. Turns out, to look to Ada-002 as if word embeddings is what it is good at is to miss the semantic richness of document embeddings. And besides, they are very cost effective, which is nice.
1 Like