Embeddings support for numbers

Hey community! I was wondering if anyone had thoughts on using the embeddings to represent relative rankings. For example, I have the following query, and 3 pieces of information.

Query: “Which company has the highest ranked Price (local)?”

Sentences:

  1. Ticker AAPL-USA has a Price (local) rank of 8
  2. Ticker ADP-USA has a Price (local) rank of 5
  3. Ticker ADM-USA has a Price (local) rank of 23

And yet, the dot product of the vector embeddings between the query and each of the sentences results in this order when ordering by dot product descending. It is very frustrating. Any thoughts on how to improve the sentence construction to make the embeddings reflect the content more accurately?

(secondary question, if I fine tuned a model with all of this type of data, would it perform much better?)

Thanks!!
Ben

2 Likes

Hi @ben1787

I don’t think you can easily use word embeddings to rank your tickers in the proper order (by rank), but I could be wrong.

I don’t think , but I could be wrong, that you can you will get a language model to provide ranking order using fine-tuning.

You are better off, in my view to have a database and the rankings in a column and simply query the DB and sort to get the rankings.

What you are attempting to do is better suited for traditional search and retrieval and not semantic search, in my view.

HTH

:slight_smile:

Generally sure that is true for this particular task, it may seem better suited for more standard methods, but this is part of a larger project and I want it all incorporated into the dialogue.

The problem with vectors is that they don’t readily preserve numerical ordering. It’s hard to say what metric you should use to say vector v is greater than vector w. You could try magnitude, but many embedding models are unit vectors (length one). Therefore the vectors from embeddings are mostly spatial, and strict numerical ordering is destroyed when talking “angles” between vectors that represent numbers. Plus the embeddings are trained on meaning of text (mostly), and useful to cluster and categorize chunks of text.

1 Like

Hi @ben1787

Larger project or not, it is important to use the right software components in each part of the software architecture.

What you are attempting to do should be done with a traditional database lookup and ranking and it is not really a the right fit for semantic search.

Just because you want to build a house with a screwdriver, does not mean you throw the hammer and nails away.

In your case, you seem to want to use embedding vectors, which is based on a language model, not a numerical model, to rank text for you which contains a number.

This is really not how embeddings are designed to work.

HTH

:slight_smile:

1 Like