Some questions about text-embedding-ada-002’s embedding

ruby_coder · January 22, 2023, 9:07am

I think what might be confusing about OpenAI embeddings is that the embedding vector for a phrase like “Anything you would like to share?” is based on an OpenAI model derived from text on the global internet. The same is true for the embedded vector for “I need to solve the problem with money”, the vector is derived from the OpenAI AAN combined with a particular training model.

The embeddings (vectors) are not based on a direct analysis of text, but on the OpenAI model based on the huge dataset used in the ANN. This is, at least, my current understanding.

So, using some Ruby code I cobbled together (using my own cosine similarity function, not from a library), let’s look at this:

irb(main):013:0> Embeddings.test_strings("I need to solve the problem with money","Anything you would like to share?")
=> 0.7614775318811315

irb(main):014:0> Embeddings.test_strings("I need to solve the problem with money","What is your financial situation?")
=> 0.8475256263838489

irb(main):015:0> Embeddings.test_strings("I need to solve the problem with money","Fraud")
=> 0.7632965853455049

irb(main):016:0> Embeddings.test_strings("I need to solve the problem with money","CitiBank")
=> 0.7823379047316411

If we rank these, the most similar are, in descending order:

“What is your financial situation?”
“CitiBank”
“Fraud”
“Anything you would like to share?”

These makes perfect sense to me, as being similar to “I need to solve the problem with money”.

So, based on what we might expect to see on the global internet, the above cosine similarities of embeddings vectors based on the text-embedding-ada-002 seems normal to me.

Topic		Replies	Views
Question on text-embedding-ada-002 API	12	6391	December 24, 2023
Can text-embedding-ada-002 be made deterministic? API embeddings , ada	18	7760	December 24, 2023
Why `OpenAI Embedding` return different vectors for the same text input? API	35	10315	April 30, 2024
Embeddings and Cosine Similarity API	20	14355	February 25, 2024
Creating a Chatbot using the data stored in my huge database Community embeddings , chatgpt , fine-tuning , api	93	87162	November 25, 2023

Some questions about text-embedding-ada-002’s embedding

Related topics