Hello there!
I had a quick question about setting a threshold for query embedding using text-embedding-3-large model.
Before I was working with text-embedding-ada-002 model with 1536 as the index dimension. And the threshold for query search I was working with 0.79 based on the other benchmarks out there.
I updated my model but used the dimensions parameter to embed 3072 dimension objects in 1536 index space like below (sample code):
embedding_model = AzureOpenAIEmbeddings(deployment=text-embedding-3-large,
model=text-embedding-3-large,
openai_api_base = xxx ,
openai_api_type = azure ,
dimensions=1536)
Then I ran my query embedding the same way with the same model text-embedding-3-large (sample code):
embedding_model = AzureOpenAIEmbeddings(
deployment=text-embedding-3-large,
openai_api_type=azure,
dimensions=1536,
)
I didn’t change the threshold, and kept it at 0.79, I got 0 hits which was weird, and then just to test, I amended it to 0.079 and I got plenty hits?
May I ask why is that happening, is the text-embedding-3-large, value ranges from 0 to 0.1 unlike ada-002 which was 0 to 1? Can I just use 0.079 instead or I need more evaluation?
Also do you recommend forcing the dimensions to be 1536 when embedding a vector in the pinecone space when using a 3072 dimension model?