Text Embedding result based on Priority

I am new to OpenAI, I have integrated text-embedding-ada-002 modal in AWS OpenSearch for the semantic search.

I want text Embedding Results based on my priority order so for example
my document is like,
{
title: “random”,
desc:“random”,
}
if i search for “abc” word then if my priority order is title > description > etc… then it should be in order of that findings

So if two document found, one document is got because of “abc” is matching in description and another got because “abc” is matching in title. So, In the Results, first should be that document which got because of title is matched irrespective the order of document in database

This is just an example there would be lots of fields…

any solution ?

also another problem is if exact match found then also score is about 0.7, I think it’s low

Thanks in advance

Welcome to the community!

Welcome to the world of embeddings!

Embeddings don’t exactly match words. They’re supposed to encode the concept or meaning of whatever the document contains. And the cosine similarity is supposed to indicate the similarity between concepts, and not whether certain words appear.

While there are some limitations to this, this means that you can sorta search across languages, even if the languages have nothing in common. To a degree.

So, if you want to “only match the title” or “only the description” - you’ll have to separately embed the title/description and keep an index for that. You can then use a rank fusion algorithm, maybe Reciprocal Rank Fusion with a bias, to lift certain components out over others. :thinking:

this is indeed pretty low. in ADA, stuff around 0.7 is pretty much useless. I would investigate what you’re actually embedding and comparing, it’s possible that you might have a logic issue here.

You could also consider using the new text embedding 3 models, they’re a little easier to interpret.

2 Likes

Yaa, I mean similarity but results should be based on priority that I want, means if “abc” found similarity in the field with top priority (here title in example), then it should be at top of my results.

prioritize the fields is my main concern here

Well, yea, what I’m telling you is if you want to discriminate by field, you’ll need to at least embed the fields separately.

How were you figuring it could work?