Discrepancy in embeddings precision

Hmm ok so something interesting. Yesterday I went and tested getting embeddings using the openai python library with the default settings. As suggested in this thread, embedding the same text twice results in slightly different embeddings. The cosine sim between the two embeddings was ~0.999. I then used encoding_format="float" which overrides the default of base64 and lo and behold embedding the same text twice resulted in identical vectors. So I changed to use that in my code. However, I went back this morning to try and figure out whether the small error in the default method was coming from openai’s servers or some issue in the python library, and when I re-tested using the default settings (which use base64), now this morning i get the same vector for the same text. So today it seems like it is fixed. I used the same text and settings as yesterday. My guess is either this was actually fixed between yesterday and today or the discrepancy is actually semi random and transient, which would be weird. Anyway I guess I’d recommend using float as the encoding_format but we’d need more testing to be able to be sure. Would be great to get someone from openai to look into this.

2 Likes