Discrepancy in embeddings precision

Using the Python library provided by OpenAI, embeddings requests return floats with up to 18 decimal places of precision. Using other methods (curl, hyper/reqwest) only returns floats with half the level of precision.

Anyone should be able to reproduce this by simply copy/pasting the example request provided in the documentation:

curl https://api.openai.com/v1/embeddings \
  -X POST \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input": "The food was delicious and the waiter...",
       "model": "text-embedding-ada-002"}'

vs

import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
openai.Embedding.create(
  model="text-embedding-ada-002",
  input="The food was delicious and the waiter..."
)

Unless there is something that I’m missing (quite possible), this would seem to be a pretty big problem for anyone interested in using or building a library outside of the one provided in Python.

2 Likes

This was discussed before.

The extra decimal points are essentially noise and can be discarded. If you’d like proof, try and perform some distance tests and you’ll see no difference.

The reasoning behind this (possibly incorrect as I can’t access my previous conversations) is that it’s sent as a float (7 decimal points) as default.

I believe there is a hidden parameter one can use to have it sent as a double

1 Like

Thanks. I hope that’s correct. I spent the better part of a week writing a Rust library only to notice the apparent discrepancy in testing the embeddings endpoint. I’ll try some distance tests as you suggest.

Again, I’m probably missing something. I’m quite sure it also had something to do with base64. There was a really nice write up and was eventually published on Github. I’ll reply once I find it.

In the meantime you can see the difference thru their library here. The answer is there.

Right, glancing through their source code to see if there was some missing parameter was one of the first things I tried and I did notice the base64 encoding. I tried adding the header “Accept-Encoding”, “base64”` to my Rust code to no effect.

I believe it’s to do this this segment:

# If a user specifies base64, we'll just return the encoded string.
# This is only for the default case.
if not user_provided_encoding_format:
    for data in response.data:

        # If an engine isn't using this optimization, don't do anything
        if type(data["embedding"]) == str:
            assert_has_numpy()
            data["embedding"] = np.frombuffer(
                base64.b64decode(data["embedding"]), dtype="float32"
            ).tolist()

which is created here

user_provided_encoding_format = kwargs.get("encoding_format", None)

My memory is still a bit fuzzy but I believe you can actually include encoding_format in your request

I found it!

2 Likes

Using the API method I am getting, on average, 9 decimal places. This is more than sufficient to use since all the vectors are scaled to unit length from the embedding engine. In your ada-002 example, it has 1536 dimensions, so if you imagine a unit vector in this space with equal values, you get a vector of 1/sqrt(1536), which is 0.0255…, so 2 decimal places. Using higher dimension models like the old davinci embedding, this could get worse, but only by a factor of 10, so you are still good.

So, like stated earlier, for this model dimension, anything more than 6-7 decimal places isn’t carrying much information and it can actually be bad if you store your intermediate embeddings as strings in a database (so it cuts the DB size in half for the lower precision, which is better too).

1 Like

This is actually a fairly large issue. We definitely need and should be able to get determinism in the embeddings. A typical use case is you might be doing some dynamic retrieval and then injecting retrieved passages into a prompt to answer a users question. I have found that the sorting order of the returned embeddings can change even with the same input question. When the sorting order changes the entire prompt where the passages are injected changes. And when the prompt changes (even if it is saying the same thing, just in a different order), the completion changes, sometimes by a lot. This makes it hard to build any sort of tests that expect a deterministic output.

2 Likes

Hmm ok so something interesting. Yesterday I went and tested getting embeddings using the openai python library with the default settings. As suggested in this thread, embedding the same text twice results in slightly different embeddings. The cosine sim between the two embeddings was ~0.999. I then used encoding_format="float" which overrides the default of base64 and lo and behold embedding the same text twice resulted in identical vectors. So I changed to use that in my code. However, I went back this morning to try and figure out whether the small error in the default method was coming from openai’s servers or some issue in the python library, and when I re-tested using the default settings (which use base64), now this morning i get the same vector for the same text. So today it seems like it is fixed. I used the same text and settings as yesterday. My guess is either this was actually fixed between yesterday and today or the discrepancy is actually semi random and transient, which would be weird. Anyway I guess I’d recommend using float as the encoding_format but we’d need more testing to be able to be sure. Would be great to get someone from openai to look into this.

2 Likes

Interesting information to ponder:

import struct
numbers64 = [0.013386417180299759, 1.0/3.0, 0.33333333]
numbers32 = [struct.unpack('!f', struct.pack('!f', number))[0] for number in numbers64]
numbers_rounded = [float(f'{number:.8g}') for number in numbers64]
numbers_restored = [struct.unpack('!f', struct.pack('!f', number))[0] for number in numbers_rounded]
print('64bit:   ', numbers64)
print('32bit:   ', numbers32)
print('Rounded: ', numbers_rounded)
print('Restored:', numbers_restored)
print('Equal:', numbers32[0] == numbers64[0], numbers32[1] == numbers32[2], numbers32 == numbers_restored)

Result:

64bit:    [0.013386417180299759, 0.3333333333333333, 0.33333333]
32bit:    [0.013386417180299759, 0.3333333432674408, 0.3333333432674408]
Rounded:  [0.013386417, 0.33333333, 0.33333333]
Restored: [0.013386417180299759, 0.3333333432674408, 0.3333333432674408]
Equal: True True True
1 Like

GPUs do math in fp32 at best, and new gens can run at fp16 modes.

The data returned from the API you can obtain in 32 bit float binary by using the base64 return type from embeddings. It’s about 7 significant digits.

A 32-bit floating-point number is represented according to the IEEE 754 standard. This standard specifies that the 32 bits are divided into three parts: 1 bit for the sign, 8 bits for the exponent, and 23 bits for the fraction, also known as the significand or mantissa.

The number of significant decimal digits that can be represented by a 32-bit float is derived from the number of bits in the fraction part. This is because the fraction part carries the precision of the floating-point number.

The formula to convert the bit length to decimal digit length is D = \log_{10}(2^B), where D is the number of decimal digits and B is the number of binary digits, or bits.

If we substitute B=23 into the formula (since there are 23 bits in the fraction part), we get D = \log_{10}(2^{23}).

Calculating this expression, we find that D is approximately 6.924. A 32-bit float can represent approximately 6.92 significant decimal digits.
(-gpt4)

more math fun

A 32-bit floating-point number is represented according to the IEEE 754 standard. This standard specifies that the 32 bits are divided into three parts: 1 bit for the sign, 8 bits for the exponent, and 23 bits for the fraction, also known as the significand or mantissa.

The floating-point representation is essentially a form of scientific notation, base 2. A number is represented as 1.f \times 2^e, where f is the fraction and e is the exponent. The fraction is a binary fraction, and the exponent is a power of 2.

Even when the fraction part is very short (or even just 1), the resulting decimal number can be irrational. This is because the conversion from a binary fraction to a decimal fraction can result in an infinite repeating decimal.

For example, consider the simple binary fraction 0.1 (in binary). This is equivalent to 1/2 in decimal. But if we consider a binary fraction like 0.01, this is equivalent to 1/4 in decimal. And a binary fraction like 0.001 is equivalent to 1/8 in decimal.

As you can see, each successive binary fraction corresponds to a decimal fraction that is a power of 2. But not all powers of 2 can be represented as finite decimal fractions. For example, 1/10 in binary is an infinite repeating fraction, 0.00011001100110011… and so on.

Therefore, even a simple binary fraction can result in an irrational decimal number when converted.

(the AI kind of lost the plan here…)

encoding_format=float is also giving me different embeddings

Using this is probably the best option for stability, but it will never be perfect. The clocks in the GPU’s and async non-associative ops of floating point arithmetic will cause slight variation … shouldn’t be enough to effect the dot products, but will change the string representations of the vectors.