Are vectors generated by text-embedding-3-small always the same for the same text input?

I am in the process of building a prototype for continuous ingestion of content into a vector database. I am using text-embedding-3-small model to generate vectors before storing a Azure Search index.

The internal API I am using to get the newly created content is not perfect which means I might be fetching content that is already stored in the index. I am thinking this should not be a problem if the model produces the same vector representation then when I send that vector to Azure Search index, it would simply be replaced in place of the current vector in the index. So my question is - will the model generate same vector for the same input text on second and subsequent calls to the model API?

1 Like

The embeddings models should not be used to verify identical contents or avoid duplication - you can use a hash algorithm for that for free.

Results are close enough between successive runs that it would be effective to almost always return the same top results, and embeddings quality is kind of subjective anyway…

3 Likes

Thanks for that response. Very useful.

I am aware of hashing techniques and aware that I can employ them to make sure I am not vectoring the same content multiple times. I was wondering if there is a way to avoid having to hash the content separately.

The only way I see would be more expensive and slower. Which is to not add what you paid embeddings for if there is an embeddings result >.999 from an exhaustive search and the text returned from the database matches.

1 Like