Ah, OK, your graph makes sense now.
Well, so you are saying that there is a 5% dot-product variation when you go from 3072 dimensions to 256 dimensions.
This isn’t too surprising to me. It suggests that 95% of the important variation is in the first 256 dimensions. So I am guessing they ordered the dimensions, using PCA or something similar, ordered from most important to least important, which is why the vectors can simply be truncated/scaled down to less dimensions.
You get a huge dimension reduction … and a huge search speed up, and reduced storage footprint … but this speed trade comes a loss in quality. But 5%? Sounds pretty good to me, although this is not any real world indication of performance.
What really matters, besides MTEB, is whether the results even make sense, even at 3072 dimensions.
In general, the sentiment I’ve gathered is that the newer models are better, and also have more dynamic range (0 to 1, instead of 0.7 to 1 from ada-002), so new thresholds need to be used.
I am planning on evaluating the new models soon, and adjust my thresholds.
Any opinions on retrieval or semantic similarity performance? In light of this 5% variation?
The cool thing, and one thing I wanted to emphasize in this thread, is you can carve out your custom dimensions from the large model. So you can go from 3072 to 256, and anything in between, to custom tune your exact speed vs. quality performance trades.
And over time, if you keep your original 3072 vector, you can re-map and tune further.