Is it possible to fine tune the embedding model?

Is it possible to fine tune the embedding model?

Greetings OpenAI community!

I have a question about the usage for the embedding model text-embedding-ada-002. Is it possible to fine-tune this model? I could only find examples for fine-tuning the prompt model however extracting embedding from prompt models is forbidden.

what’s your use case? I dont see why you would need to fine-tune embeddings

1 Like

We have an in-house recommendation model to match A and B (both are long text, we first get their embedding and then use a two-tower model trained with A-B pairs to do the ranking), and we would like to test the performance using GPT-3 to initialize embeddings for A and B. Ideally, fine-tuning embedding with positive and negative A-B pairs should get even better performance.

Hello @ray001

From the API docs (which I have also confirmed via testing);

Fine-tuning is currently only available for the following base models: davinci , curie , babbage , and ada . These are the original models that do not have any instruction following training (like text-davinci-003 does for example)

Reference:

OpenAI (Beta) - Fine-tuning

Hope this helps.

1 Like

Thanks for the information. I also found this page, was wondering if anyone found alternatives.

There are no “alternatives”.

@ray001 Did you end up finding a way to fine-tune ada? I am trying to do the exact thing that you wanted to and would love to know if you’ve figured it out. Thanks!

Perhaps this bias matrix approach will be of use to some inquiring here.

“This notebook demonstrates one way to customize OpenAI embeddings to a particular task.”

From what I understand, the two-tower model is just a neural network on top of the embeddings, so why do you need to tune the original embedding model? You need to create another NN.

Here is the engine eBay uses:

There are some alternatives.
For example in a two-tower taking embedding from entities. You can

  1. get the raw GPT-3 embedding for those entities
  2. apply a set of CNN+FC layers to the original embedding
  3. guide the training of layers in step 2 with good/bad-fit labeled data
  4. then raw GPT-3 embeddings processed by the CNN+FC layers trained in 3 would be the fine-tuned embedding
1 Like

Raw gpt-3 embedding can already be used in a two-tower model and return a reasonable result. This is because the more critical part of a two-tower model is the embedding compared to NN layers after.

The reason one could benefit from fine-tuning the original gpt-3 embedding is that the raw gpt-3 embedding might not have been exposed to the specific tasks or the subdomain knowledge.

A foo-bar example would be, imagine there is a limited corpus with only 2 words,
[“machine operation”, “artificial intelligence”]
And we want to find the most similar word to an input word of ‘machine learning’. Similarity calculation using raw GPT3 embedding returned that
machine learning and machine operation has a sim score of 0.87
machine learning and artificial intelligence has a sim score of 0.88
Both scores make sense since the first is by checking the letter overlaping and the second is by checking the semantic meaning. But in my use case the first type of similarity would introduce noise. I managed to fix it in the reply to vamsi. Please feel free to have a look and see if it makes sense to you

At a high level I understand what you are saying, which is, you need high scores on semantic meaning and not word overlap. Got it. Then you say you can achieve this by a NN (two-tower). Got it. Then you say the fine-tuned embedding is the output of your NN. Got it. All of this is fine and good and doesn’t need a direct fine-tune of the original embedding engine, since you are creating them in the output of your NN. I think you answered your own question, which is yes, you can create a fine-tuned embedding, which is created by the output of you own neural net. Totally feasible and makes sense. But you can’t upload some training file to the OpenAI API for embedding-ada-002 and get the same thing. Which is what I thought your original post was about.

And FYI, you can improve the geometry of the embeddings too, I did this in this thread. Some questions about text-embedding-ada-002’s embedding - #42 by curt.kennedy

It removes the mean embedding vector and uses PCA to reduce the dimensions and increase the spread without altering the meaning too much.

So yeah, post-processing of the embeddings is certainly advised and encouraged in certain situations.