OK, I coded up the algorithm and I’d say I got good results in preliminary testing. I now get cosine similarities that are positive, negative and zero across the embedding search space. The results seem to make sense too!
Only weird thing is that my max and min cosine similarities are ±0.1 instead of +/-1. I am only using the top 15 dimensions (D/100 for ada-002). Maybe this is reducing the energy somehow? Anyway, the relative correlations and anti-correlations seem to make sense. Plus my top correlations seem better than the original ones.
Will have to test more to see, but so far the algorithm in the paper seems to work!