Some questions about text-embedding-ada-002’s embedding

Thanks for pointing this massive error out!

I updated the code above with this line:

U = U[D:,:] # take All But The Top!

It is working much better now! (More spread)

But now I’m wondering how many dimensions I should really drop. It’s set to 15 right now, but whoever uses this needs to examine this situation in more detail.

PS, I am not using this in operations now, so I have very little insights. But the variance is higher and the cosine similarities still make sense when the ABBT are dropped.