I have a question about the usage for the embedding model text-embedding-ada-002. Is it possible to fine-tune this model? I could only find examples for fine-tuning the prompt model however extracting embedding from prompt models is forbidden.
We have an in-house recommendation model to match A and B (both are long text, we first get their embedding and then use a two-tower model trained with A-B pairs to do the ranking), and we would like to test the performance using GPT-3 to initialize embeddings for A and B. Ideally, fine-tuning embedding with positive and negative A-B pairs should get even better performance.
From the API docs (which I have also confirmed via testing);
Fine-tuning is currently only available for the following base models: davinci , curie , babbage , and ada . These are the original models that do not have any instruction following training (like text-davinci-003 does for example)
@ray001 Did you end up finding a way to fine-tune ada? I am trying to do the exact thing that you wanted to and would love to know if youâve figured it out. Thanks!
From what I understand, the two-tower model is just a neural network on top of the embeddings, so why do you need to tune the original embedding model? You need to create another NN.
Raw gpt-3 embedding can already be used in a two-tower model and return a reasonable result. This is because the more critical part of a two-tower model is the embedding compared to NN layers after.
The reason one could benefit from fine-tuning the original gpt-3 embedding is that the raw gpt-3 embedding might not have been exposed to the specific tasks or the subdomain knowledge.
A foo-bar example would be, imagine there is a limited corpus with only 2 words,
[âmachine operationâ, âartificial intelligenceâ]
And we want to find the most similar word to an input word of âmachine learningâ. Similarity calculation using raw GPT3 embedding returned that machine learning and machine operation has a sim score of 0.87 machine learning and artificial intelligence has a sim score of 0.88
Both scores make sense since the first is by checking the letter overlaping and the second is by checking the semantic meaning. But in my use case the first type of similarity would introduce noise. I managed to fix it in the reply to vamsi. Please feel free to have a look and see if it makes sense to you
At a high level I understand what you are saying, which is, you need high scores on semantic meaning and not word overlap. Got it. Then you say you can achieve this by a NN (two-tower). Got it. Then you say the fine-tuned embedding is the output of your NN. Got it. All of this is fine and good and doesnât need a direct fine-tune of the original embedding engine, since you are creating them in the output of your NN. I think you answered your own question, which is yes, you can create a fine-tuned embedding, which is created by the output of you own neural net. Totally feasible and makes sense. But you canât upload some training file to the OpenAI API for embedding-ada-002 and get the same thing. Which is what I thought your original post was about.