Discrepancy in embeddings precision

Using the Python library provided by OpenAI, embeddings requests return floats with up to 18 decimal places of precision. Using other methods (curl, hyper/reqwest) only returns floats with half the level of precision.

Anyone should be able to reproduce this by simply copy/pasting the example request provided in the documentation:

curl https://api.openai.com/v1/embeddings \
  -X POST \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input": "The food was delicious and the waiter...",
       "model": "text-embedding-ada-002"}'


import os
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")
  input="The food was delicious and the waiter..."

Unless there is something that I’m missing (quite possible), this would seem to be a pretty big problem for anyone interested in using or building a library outside of the one provided in Python.


This was discussed before.

The extra decimal points are essentially noise and can be discarded. If you’d like proof, try and perform some distance tests and you’ll see no difference.

The reasoning behind this (possibly incorrect as I can’t access my previous conversations) is that it’s sent as a float (7 decimal points) as default.

I believe there is a hidden parameter one can use to have it sent as a double

1 Like

Thanks. I hope that’s correct. I spent the better part of a week writing a Rust library only to notice the apparent discrepancy in testing the embeddings endpoint. I’ll try some distance tests as you suggest.

Again, I’m probably missing something. I’m quite sure it also had something to do with base64. There was a really nice write up and was eventually published on Github. I’ll reply once I find it.

In the meantime you can see the difference thru their library here. The answer is there.

Right, glancing through their source code to see if there was some missing parameter was one of the first things I tried and I did notice the base64 encoding. I tried adding the header “Accept-Encoding”, “base64”` to my Rust code to no effect.

I believe it’s to do this this segment:

# If a user specifies base64, we'll just return the encoded string.
# This is only for the default case.
if not user_provided_encoding_format:
    for data in response.data:

        # If an engine isn't using this optimization, don't do anything
        if type(data["embedding"]) == str:
            data["embedding"] = np.frombuffer(
                base64.b64decode(data["embedding"]), dtype="float32"

which is created here

user_provided_encoding_format = kwargs.get("encoding_format", None)

My memory is still a bit fuzzy but I believe you can actually include encoding_format in your request

I found it!


Using the API method I am getting, on average, 9 decimal places. This is more than sufficient to use since all the vectors are scaled to unit length from the embedding engine. In your ada-002 example, it has 1536 dimensions, so if you imagine a unit vector in this space with equal values, you get a vector of 1/sqrt(1536), which is 0.0255…, so 2 decimal places. Using higher dimension models like the old davinci embedding, this could get worse, but only by a factor of 10, so you are still good.

So, like stated earlier, for this model dimension, anything more than 6-7 decimal places isn’t carrying much information and it can actually be bad if you store your intermediate embeddings as strings in a database (so it cuts the DB size in half for the lower precision, which is better too).

1 Like

This is actually a fairly large issue. We definitely need and should be able to get determinism in the embeddings. A typical use case is you might be doing some dynamic retrieval and then injecting retrieved passages into a prompt to answer a users question. I have found that the sorting order of the returned embeddings can change even with the same input question. When the sorting order changes the entire prompt where the passages are injected changes. And when the prompt changes (even if it is saying the same thing, just in a different order), the completion changes, sometimes by a lot. This makes it hard to build any sort of tests that expect a deterministic output.


Hmm ok so something interesting. Yesterday I went and tested getting embeddings using the openai python library with the default settings. As suggested in this thread, embedding the same text twice results in slightly different embeddings. The cosine sim between the two embeddings was ~0.999. I then used encoding_format="float" which overrides the default of base64 and lo and behold embedding the same text twice resulted in identical vectors. So I changed to use that in my code. However, I went back this morning to try and figure out whether the small error in the default method was coming from openai’s servers or some issue in the python library, and when I re-tested using the default settings (which use base64), now this morning i get the same vector for the same text. So today it seems like it is fixed. I used the same text and settings as yesterday. My guess is either this was actually fixed between yesterday and today or the discrepancy is actually semi random and transient, which would be weird. Anyway I guess I’d recommend using float as the encoding_format but we’d need more testing to be able to be sure. Would be great to get someone from openai to look into this.