Fine tuning vs. Embedding

Hi
I have been doing a lot of post-reading and watching videos on the use cases and applicability of fine-tuning vs embedding. Over this time, my understanding of whether I should or can use fine-tuning to introduce new knowledge has flip-flopped several times.

I have read that fine-tuning is not what to use if you want an existing model to learn new information. And that embedding is for this. Instead, fine-tuning teaches new structures.

But now I’m not so sure.

Finally, I asked ChatGPT using the example of teaching an existing model about cities on Mars which would be new to it. It qualified its answer but did say:

‘Fine-tuning a pre-trained model on a new dataset that includes information about cities on Mars could allow the model to learn new knowledge about these cities, such as their name, population, and other characteristics. This new knowledge would be stored in the model’s weights, and the model would be able to use this knowledge to answer questions about the cities on Mars.’

Is this accurate? Can I fine tune with a series of discrete bytes of data (eg facts about cities on Mars) and then latter ask questions about this new data?

Thank you.

6 Likes

I got you, this is hard to find a guide line…

I recommand you to read those two :

2 Likes

If you want a detailed walkthrough on using embeddings for questions about Mars, I would mimic this tutorial.

Basically, you embed all your facts about Mars. Then the incoming question comes in and you embed this. You correlate your incoming question with the entire set of embedded facts. Then based on the top correlations, you pull all the top facts from the database and form a prompt out if this (truncate to fit the limited size of the prompt window). Then you ask GPT-3 to answer the question based on all the top correlated facts in your prompt.

This is probably the best way to extract specific knowledge.

If you fine-tune, it might not be as specific to your facts as you like because you are trying to overcome the noise from the entire set of GPT-3 coefficients (which was trained on the internet, and may not possess your facts).

When it comes to vector databases, you can probably ditch them if you have less than a million embedded facts, but you (or someone helping you) would have to be proficient at database and some amount of coding to achieve this on your own. So don’t get scared away by Pinecone or Weaviate.

4 Likes

I have signed up for the course. Thank you @pinardalec

What you describe @curt.kennedy is where my current thinking is at. Thank you.

1 Like

Interesting discussion. Another point of view might be Codex. Do I prefer building embeddings with my (company’s private) code bases or using fine-tuning?

The embeddings vs fine tuning is a great question, I think we should update the docs to give better guidance on this, I will add it to the queue but there’s a lot of other things that need to be updated before that so hang tight.

10 Likes

This is a clear answer and I agree, but to me the elephant in the room is the size constraint placed on the prompt. It limits the context of the prompt and therefore the resources available to answer.

4 Likes

I found this video quite useful in understanding the difference

From my understanding
Fine-tuning is a way to add new knowledge to an existing model. So it’s a simple upgrade, same usage.
Embedding is a way to let the model search in a “database” and return the best result. So it’s good for finding something specific for example.

2 Likes