I noticed that when querying the same prompt on the same vector store using retrieval, I get different results. I couldn’t find any relevant explanation in OpenAI’s official documentation, but theoretically, the similarity scores calculated for the same prompt with individual chunk should be consistent. Can someone explain the reason for this? Thank you.
Embeddings, the AI model that returns a vector against the input, is not deterministic. Every run can be different. That means that semantic similarity scores that are close can flip position between trials of the same input with those value variations in the embeddings vector return.
If you are not directly using the semantic search vector store endpoint, you are also having an AI model create its own search query, with even more variation between turns with different inputs driving what the AI generates its tool call based on.
OpenAI could explain the reason, but doesn’t.
I got it. Thanks. I’ve test the embedding results using the code listed below
import openai
import numpy as np
from typing import List
import os
client = openai.OpenAI()
def get_embedding(text: str, model: str = "text-embedding-3-small") -> List[float]:
"""get embedding from OpenAI API"""
response = client.embeddings.create(
input=text,
model=model
)
return response.data[0].embedding
def test_embedding_consistency(text: str, num_trials: int = 5, model: str = "text-embedding-3-small") -> None:
"""
test the consistency of OpenAI embeddings by comparing multiple runs
of the same text input.
Args:
text: the text to be embedded and compared
num_trials: the number of times to run the embedding
model: the OpenAI model to use for embedding
"""
print(f"Test sentence: {text}")
print(f"Num trials: {num_trials}")
print(f"Using model: {model}\n")
# Store each embedding vector
embeddings = []
# Retrieve embeddings multiple times
for i in range(num_trials):
embedding = get_embedding(text, model)
embeddings.append(embedding)
print(f"Vector {i+1} retrieved, vector length: {len(embedding)}")
# Convert embedding vectors to numpy array for comparison
embeddings = [np.array(emb) for emb in embeddings]
# Compare all embedding vectors for consistency
is_consistent = True
for i in range(1, num_trials):
if not np.array_equal(embeddings[0], embeddings[i]):
is_consistent = False
print(f"Check failed! Embedding {i+1} differs from the first embedding.")
break
if is_consistent:
print("\nResult: All embedding vectors are completely consistent!")
else:
print("\nResult: Embedding vectors are inconsistent!")
# Calculate cosine similarity between any two embeddings (as an additional check)
from sklearn.metrics.pairwise import cosine_similarity
similarity = cosine_similarity([embeddings[0], embeddings[1]])[0][1]
print(f"\nCosine similarity between the first and second embeddings: {similarity:.6f}")
if __name__ == "__main__":
# Test sentence for embedding
test_sentence = (
"Embeddings, the AI model that returns a vector "
"against the input, is not deterministic. Every run "
"can be different. That means that semantic "
"similarity scores that are close can flip position "
"between trials of the same input with those value "
"variations in the embeddings vector return. \n\n"
"If you are not directly using the semantic search "
"vector store endpoint, you are also having an AI "
"model create its own search query, with even more "
"variation between turns with different inputs driving "
"what the AI generates its tool call based on."
)
# Execute test
test_embedding_consistency(test_sentence, num_trials=5)
And got the output as
Test sentence: Embeddings, the AI model that returns a vector against the input, is not deterministic. Every run can be different. That means that semantic similarity scores that are close can flip position between trials of the same input with those value variations in the embeddings vector return.
If you are not directly using the semantic search vector store endpoint, you are also having an AI model create its own search query, with even more variation between turns with different inputs driving what the AI generates its tool call based on.
Num trials: 5
Using model: text-embedding-3-small
Vector 1 retrieved, vector length: 1536
Vector 2 retrieved, vector length: 1536
Vector 3 retrieved, vector length: 1536
Vector 4 retrieved, vector length: 1536
Vector 5 retrieved, vector length: 1536
Check failed! Embedding 2 differs from the first embedding.
Result: Embedding vectors are inconsistent!
Cosine similarity between the first and second embeddings: 1.000000
Thanks.