Is it possible set weights to different parts of the text sent to Embedding API

is it possible set weights to different parts of the text sent to Embedding API

I want to compare multiple vectors returned from Embedding API, however I want it to give emphasis on certain parts of the text sent to the Embedding API, because I believe these parts will be more defining for each entity. I believe this is not possible, right ?

More over, I want to know your opinions about what I am trying to archive, does it makes reasonable sense ? Should I leave it to GPT automatically give emphasis on what is more defining for each text ? Should this be approached with testing and comparing the results ?

2 Likes

It is to my knowledge that you cannot change the emphasis of the Embedding API, because all it does is to give you a vector representation of the text from a pre-trained model.

So to change the emphasis, you would have to trained your own model. See Training Overview — Sentence-Transformers documentation

But it most cases, this is not necessary. What is your use case?

Thanks for the info. I get it.

Another option that I thought would be to get this specific term that I want to give more emphasis and get an embedding vector individually for it. Then I would join the two vectors (the original + specific text that I want to give emphasis), either summing or appending. Any thought on this idea ?

My case is simple to find other vectors closet to my original vector, however I want the give emphasis on certain elements from the full body text, I want it to consider that specific a specific text more heavily when trying to find the closest vector.

Ya, I see what you mean.
I was thinking this might be useful for you, but you might have to make minor changes.
Let me know how it goes.

@nelson you are great! ; )

I could not quite understand the relation between embeddings api and completion in the spreadsheet, to me it seems like 2 different and independent tasks

@dogao That is correct, they are two different tasks, you can use it independently or together.
The embeddings feature is used to search text, so if you have a lot of content you can use the gsheet/data tab as a database. The completion feature only call the API and by using gsheet, you can chain together the API request and response to create your own workflow.

1 Like