Discrepancy in embeddings precision

Using the API method I am getting, on average, 9 decimal places. This is more than sufficient to use since all the vectors are scaled to unit length from the embedding engine. In your ada-002 example, it has 1536 dimensions, so if you imagine a unit vector in this space with equal values, you get a vector of 1/sqrt(1536), which is 0.0255…, so 2 decimal places. Using higher dimension models like the old davinci embedding, this could get worse, but only by a factor of 10, so you are still good.

So, like stated earlier, for this model dimension, anything more than 6-7 decimal places isn’t carrying much information and it can actually be bad if you store your intermediate embeddings as strings in a database (so it cuts the DB size in half for the lower precision, which is better too).

1 Like