Discrepancy in embeddings precision

curt.kennedy · February 28, 2023, 6:55pm

Using the API method I am getting, on average, 9 decimal places. This is more than sufficient to use since all the vectors are scaled to unit length from the embedding engine. In your ada-002 example, it has 1536 dimensions, so if you imagine a unit vector in this space with equal values, you get a vector of 1/sqrt(1536), which is 0.0255…, so 2 decimal places. Using higher dimension models like the old davinci embedding, this could get worse, but only by a factor of 10, so you are still good.

So, like stated earlier, for this model dimension, anything more than 6-7 decimal places isn’t carrying much information and it can actually be bad if you store your intermediate embeddings as strings in a database (so it cuts the DB size in half for the lower precision, which is better too).

Topic		Replies	Views
Non-deterministic embedding results using text-embedding-ada-002 API	7	5168	December 24, 2023
Can text-embedding-ada-002 be made deterministic? API embeddings , ada	18	7432	December 24, 2023
Why `OpenAI Embedding` return different vectors for the same text input? API	35	9747	April 30, 2024
Splitting text into chunks versus reducing the text API embeddings , ada	9	2316	April 5, 2024
Different embeddings for exact same text API embeddings	7	3487	December 18, 2023

Discrepancy in embeddings precision

Related topics