Embedding does not capture negative expression?

tansan78 · January 9, 2024, 4:59pm

I am trying to use embedding to do recommendations using user provided preferences. I found that embeddings seem not able to capture negative expression, like “I don’t like…”.

I appreciate if any one can suggest some solution.

Example:

>>> m1 = 'this is a channel about football game'
>>> m2 = 'I like sports'
>>> m3 = 'I do not like football'

>>> a = client.embeddings.create(input=[m1, m2, m3], model="text-embedding-ada-002")
>>> a1, a2, a3 = a.data[0].embedding, a.data[1].embedding, a.data[2].embedding
>>> np.dot(a1, a2), np.dot(a1, a3)
(0.8131731681972549, 0.8261072236929238)

As you can see from the above example, m3 expresses disliking of football, but its similarity with football channel is very high.

This could cause me to wrongly recommend a football channel to a user who dislikes football.

anon22939549 · January 9, 2024, 6:13pm

The results you are seeing is because m1 and m3 are both about football which is more semantically similar than something about sports.

This is expected behaviour.

bruce.dambrosio · January 9, 2024, 9:06pm

I think the classic answer is subtract.
So, embedding(‘sports’) - embedding(‘football’)
you will pbly have to renormalize

curt.kennedy · January 9, 2024, 9:30pm

Have you tried this with ada-002?

I’m asking because the delta vector (normalized) could easily put you outside ada-002 small concentrated hyper-cone, so that any new embedding would be far away from it.

The only way to get close to this would be to subtract other things, and see if it aligns with this “out of bounds” newly created vector.

Here is a rough visualization:

The new delta vector is pointing way outside of the cone in this picture. So any new things embedded will not allign with this new vector since they all live in the patch.

bruce.dambrosio · January 9, 2024, 9:38pm

nope, haven’t tried. Valid concern, for sure.

tansan78 · January 10, 2024, 2:39am

You suggest that openai embedding captures only the word level semantics (like word-to-word match for “football” or “sports”)?

If so, this is a little disappointing. I thought OpenAI embedding could capture sentence level semantics (like the meaning of a whole sentence).

anon22939549 · January 10, 2024, 3:12am

That’s not what I’m suggesting at all. But, if you need to find the relevant context for someone who says they don’t like football, the fact a channel is about football is more important for them than it is to someone who says they like sports—if for no other reason than the model can ensure it doesn’t recommend that channel.

Do you understand now?

tansan78 · January 10, 2024, 4:39am

That’s not what I’m suggesting at all. But, if you need to find the relevant context for someone who says they don’t like football, the fact a channel is about football is more important for them than it is to someone who says they like sports—if for no other reason than the model can ensure it doesn’t recommend that channel.

Thanks a lot for helping!

I am a newbie in this domain. Here is my thought:

For relevance (always positive), what you said makes sense. m1 and m3 are definitely relevant.
Similarity (cosine or dot product) could be negative and in this case should be negative.

Maybe all these don’t matter. The import thing for me is to find a solution for my recommendation.

Foxalabs · January 10, 2024, 12:03pm

Hmm, did you just sneak a deathstar into the chat?

RonaldGRuckus · January 10, 2024, 4:30pm

The keyword is “essence” of semantics.

It’s slightly counter-intuitive. I had this initial pitfall as well. My first attempt was “Hot”, and “Cold”, thinking they would be completely different. They are, but only in a certain method of measurement. By their essence they are very similar: They both represent temperature, they both can be used as measurements, they both can be used to describe items, people. They can cause injury.

Imagine you drew “Hot”, and “Cold”, or “Likes”, and “Dislikes”, and then you had to create a brainstorm of all the meanings behind it. You would find that they truly share a lot of the same characteristics.

Same with “likes” and “dislikes”. Both carry the same meaning in essence. The embedding model does not perform the logic that you intuitively want it to.

What you are looking for is 2 separate classifications. One for the sport, one for the preference. This can be done with embeddings and/or LLMs.

You can set points in the embedding space and then see how these items compare. I’m not going to try and fluff the numbers, you can see that “no preference” isn’t perfect, so false positives are an issue.

You would also need to classify all the sports and perform the same comparison tests.
You could put this all together with a fine-tuned model to output {PREFERENCE-SPORT} as well, up to you. Honestly a base model would probably work fine.

But, Completion will soon be gone as OpenAI covers up the ability to spit training data verbatim.

curt.kennedy · January 10, 2024, 7:03pm

With the embeddings, you should get better results if you correlate to previously known (labeled) embeddings.

So if your input is:

“I do not like football”

And your previous dislike embeddings for the label “Football - Dislikes” are:

“Football is bad”
“Football is dumb”
…
Etc.

Then you correlate your input to all previous labeled embeddings, and select the category corresponding to the highest correlation.

So you can do this with labeled embeddings, instead of a fine-tune, even though the fine-tune isn’t a bad idea either.

The nice thing about the embeddings classifier, is you can add or remove embeddings on the fly, where as there is no easy “undo” operation in a fine-tune.

RonaldGRuckus · January 10, 2024, 7:45pm

I do like this, in theory.

I tried a quick mash-up by throwing them all together, calculating the centroid and it works similar to what I suggested above (I used ChatGPT to generate 5 likes and dislikes statements)

Much better for combining the sport and preference, to be fair.

How do you plan to remove non-preferential/unrelated queries with this? They would be mixed inside everything else

curt.kennedy · January 10, 2024, 7:54pm

The “None” case could be inferred by thresholding the correlation.

So if max is less than 0.8 (for example), then declare the “None” case. Just pick a good threshold based on the data you are seeing.

With enough labeled embeddings, you would expect something to pop above 0.8 (or whatever the threshold should be).

With enough labeled data, over time, you can push the threshold up, to say, 0.9.

So this is a data driven approach that needs adjusting based on how much labeled data you have.

RonaldGRuckus · January 10, 2024, 8:03pm

Nice. Yes this does work quite nicely

I like this a lot. Much more dynamic.

curt.kennedy · January 10, 2024, 8:27pm

Yes, also this embedding classifier approach costs less than running a fine-tune.

Also, this approach can even utilized without web access if you run small embedding models locally.

Also easily scalable to run in paralell (shard each embedding set and run in parallel).

So more dynamic + less FLOPS (smaller hardware footprint) + local options for remote applications + easily scalable.

Also you can hybridize this with multiple embedding engines at the same time and combine the rankings with RRF. You can also run a non-inference based keyword version as absolute worst case backup if all your inference paths are down.

So in ops, your system is not reliant on only one embedding engine, it can run several at once, and any embedding engine outage won’t take you down (or rely on your worst case non-inference local Keyword backup). So you get as much redundancy as you want and massive uptimes.

So lots more advantages than meets the eye

RonaldGRuckus · January 10, 2024, 8:36pm

And the powerful analytics from a transparent galactic box of queries already in their respective position Any edge-cases could be easily routed to an LLM worst case.

I’m an embedding believer. All the way.

What do you think of RSF (Relative Score Fusion)?

In developing these two algorithms, we carried out some internal benchmarks testing recall on a standard (FIQA) dataset. According to our internal benchmarks, the relativeScoreFusion algorithm showed a ~6% improvement in recall over the default rankedFusion method.

Which apparently (haven’t tried this) works well with this cool newish feature of “Autocut”:

In 1.20 we introduced the AutoCut feature, which can intelligently retrieve groups of objects from a search. AutoCut relies on there being natural “clusters” (groups of objects with close scores).

AutoCut works well with relativeScoreFusion, which often results in natural clusters that autocut can detect.

curt.kennedy · January 10, 2024, 8:51pm

It makes sense. It basically keeps the relative rankings in-tact and not just ordering by unit integer distances (so use 1, 0.33, 0 and not 1, 2, 3).

Also should mention, one really cool thing about RSF over RRF is that when you have a case where the embeddings are all similar, but the keyword leg has more differentiation, the keyword leg will determine the winner, which is a trait that I really like!

So you don’t get messed up results if one thing is relatively close, across the board, in its ranking stream.

Also, I like that there is no reciprocal, or division, it’s just a straight up weighted sum. So if you partition the weightings to sum up to 1, you get an absolute score from 0 to 1, which is nice!

Intuitively, it’s a better fusion algorithm, so I’d have to try it out!

bruce.dambrosio · January 11, 2024, 6:53pm

I find that every time I try to use actual scores for fusion, I quickly run into a counter-example where it does the reverse of what I want. sigh. that weaviate autocut feature sounds pretty cool, though…

Topic		Replies	Views
Adding values to ADA-002 embeddings? API ada	20	1806	September 20, 2023
Embeddings giving incorrect results API	27	7028	September 16, 2023
Quality of embeddings using davinci-001 embeddings model vs. ada-002 model API embeddings	15	3831	April 9, 2024
Fine tuning for use of keyword lists API fine-tuning	15	2793	December 10, 2023
BERT better than Ada 002? API embeddings , api , ada002	11	6115	November 13, 2023

Embedding does not capture negative expression?

Related topics