Poor embedding performance using ada for portuguse

It works for most because folks aren’t paying attention to the correlation value. They are only grabbing the top K highest scoring things. So the non-isotropic behavior of the model is swept under the rug.

They aren’t looking for de-correlated things, or orthogonal things. Or measuring how uncorrelated your “top K” results really are. Another reason is that, with RAG, they feed the top K answers into the LLM, and let the LLM decide if it’s related or not.

So because the LLM can also sort out the non-correlated results, it also sweeps the issue under the rug. :rofl:

There are many theories on why this is happening in the model, like overcompensating for hidden states. And it seems to happen in most models, but it is really bad in ada-002.

Here is the paper discussing this:

Also, here is code that I implemented “ABTT” that essentially de-biases and de-correlates your embeddings.

2 Likes