Embeddings support for numbers

ben1787 · March 10, 2023, 5:37pm

Hey community! I was wondering if anyone had thoughts on using the embeddings to represent relative rankings. For example, I have the following query, and 3 pieces of information.

Query: “Which company has the highest ranked Price (local)?”

Sentences:

Ticker AAPL-USA has a Price (local) rank of 8
Ticker ADP-USA has a Price (local) rank of 5
Ticker ADM-USA has a Price (local) rank of 23

And yet, the dot product of the vector embeddings between the query and each of the sentences results in this order when ordering by dot product descending. It is very frustrating. Any thoughts on how to improve the sentence construction to make the embeddings reflect the content more accurately?

(secondary question, if I fine tuned a model with all of this type of data, would it perform much better?)

Thanks!!
Ben

ruby_coder · March 11, 2023, 2:50am

Hi @ben1787

I don’t think you can easily use word embeddings to rank your tickers in the proper order (by rank), but I could be wrong.

I don’t think , but I could be wrong, that you can you will get a language model to provide ranking order using fine-tuning.

You are better off, in my view to have a database and the rankings in a column and simply query the DB and sort to get the rankings.

What you are attempting to do is better suited for traditional search and retrieval and not semantic search, in my view.

HTH

ben1787 · March 13, 2023, 12:43am

Generally sure that is true for this particular task, it may seem better suited for more standard methods, but this is part of a larger project and I want it all incorporated into the dialogue.

curt.kennedy · March 13, 2023, 3:45am

The problem with vectors is that they don’t readily preserve numerical ordering. It’s hard to say what metric you should use to say vector v is greater than vector w. You could try magnitude, but many embedding models are unit vectors (length one). Therefore the vectors from embeddings are mostly spatial, and strict numerical ordering is destroyed when talking “angles” between vectors that represent numbers. Plus the embeddings are trained on meaning of text (mostly), and useful to cluster and categorize chunks of text.

ruby_coder · March 13, 2023, 3:57am

Hi @ben1787

Larger project or not, it is important to use the right software components in each part of the software architecture.

What you are attempting to do should be done with a traditional database lookup and ranking and it is not really a the right fit for semantic search.

Just because you want to build a house with a screwdriver, does not mean you throw the hammer and nails away.

In your case, you seem to want to use embedding vectors, which is based on a language model, not a numerical model, to rank text for you which contains a number.

This is really not how embeddings are designed to work.

HTH

Topic		Replies	Views
Help with embeddings and semantic search Community embeddings , chatgpt , semantic-search	7	1534	September 27, 2024
Question about embeddings (ada 002) with numeric values API	7	2553	December 17, 2023
Use embeddings to measures how well an answer fits the question API embeddings	5	312	June 29, 2024
How do I add weights to specific parts of the embeddings? API api	6	2755	July 25, 2023
Can I add embeddings together? API	3	1636	August 22, 2022

Embeddings support for numbers

Related topics