Use embeddings to measures how well an answer fits the question

endamaco · June 28, 2024, 7:26pm

Hi,
based on this old repo made by Tensorflow github[.com]/tensorflow/tfjs-models/tree/master/universal-sentence-encoder I thought to try using embeddings calculated with OpenAI to measure how well an answer fits a given question. I’ve made some tests but the results seem to be inconsistent. Here are some examples of model used | question | answer | dot product
First example

text-embedding-3-small | what's your name? | the pen is on the table | 0.2100624394110477
text-embedding-3-large | what's your name? | the pen is on the table | 0.15766028555933212
text-embedding-ada-002 | what's your name? | the pen is on the table | 0.7728555090641087

text-embedding-3-large behaves much better than text-embedding-ada-002

Second example:

text-embedding-3-small | what's your name? | my name is marco | 0.4585776699017621
text-embedding-3-large | what's your name? | my name is marco | 0.43940777514817453
text-embedding-ada-002 | what's your name? | my name is marco | 0.8278902967827175

text-embedding-ada-002 behaves much better than text-embedding-3-small which behaves slightly better than text-embedding-3-large

Third example

text-embedding-3-small | I've broken my laptop, what can I do? | come to our store to have some assistance | 0.1888791306336703
text-embedding-3-large | I've broken my laptop, what can I do? | come to our store to have some assistance | 0.1511331633002332
text-embedding-ada-002 | I've broken my laptop, what can I do? | come to our store to have some assistance | 0.7774396662317862

text-embedding-ada-002 behaves much better than text-embedding-3-small which behaves slightly better than text-embedding-3-large

What are your thoughts? My code is pretty easy

client = OpenAI(api_key=api_key)
model=["text-embedding-3-small","text-embedding-3-large","text-embedding-ada-002"]
input=["I've broken my laptop, what can I do?", "come to our store to have some assistance"]
for m in model:
    resp = client.embeddings.create(input=input,model=m)
    embedding_a = resp.data[0].embedding
    embedding_b = resp.data[1].embedding
    similarity_score = np.dot(embedding_a, embedding_b)
    print(m,"|", input[0],"|", input[1], "|",similarity_score)

Do you think dot product is useful? Based on platform.openai[.com]/docs/guides/embeddings/which-distance-function-should-i-use I thought yes, but I don’t find quality results

anon10827405 · June 28, 2024, 7:30pm

The older version of ada-002 would typically only have a distance/angle of between 0.7-0.9 I believe. While the other models are much more typical (0-1)

endamaco · June 28, 2024, 7:35pm

ok, but I would have expected new -3 models to output a bit higher result… In the second example they didn’t even reach a .5 distance, not quite good.
even in the third example the scores are pretty low

anon10827405 · June 28, 2024, 7:41pm

Instead of trying to use the values at face value try comparing them to more entries. Embeddings capture the essence of semantics. Are there connections between an answer that fits a question? Definitely. But, there’s also a lot of other connections that are being considered as well.

Trying to find an answer for a question is the basis of RAG using unstructured semantics through embeddings.

endamaco · June 28, 2024, 8:52pm

Do you have an example in mind?

anon10827405 · June 29, 2024, 1:59pm

I guess I can ask first what exactly are you trying to capture?

If the answer fits in a grammatical sense,
It is correct because it’s ripped from a knowledge database

For 1 I would gather 2 groups of >100 datapoints (easy to do with GPT). Calculate a centroid (which would ideally amplify the important dimensions you want to focus on & dampen the wildly varying dimensions), and then use a comparison test. One label for “doesn’t fit”, and another for “fits”. Pretty straightforward. One group are combinations of sentences that make sense. The other group are incoherent sentences.

If you’re looking for QA this can be found anywhere

Topic		Replies	Views
Semantic search through embeddings API	3	1297	January 22, 2023
Feeding data then ask questions about it API	1	1515	February 28, 2024
Inconsistent Embedding Results for my dataset API embeddings	1	84	November 14, 2024
Preparing the dataset for embeddings API	10	6165	December 17, 2023
I read about embeddings and I want to try it. How to start? Community embeddings , chatgpt , api	2	4803	August 11, 2023

Use embeddings to measures how well an answer fits the question

Related topics