Fine tuning vs. Embedding

I have been doing a lot of post-reading and watching videos on the use cases and applicability of fine-tuning vs embedding. Over this time, my understanding of whether I should or can use fine-tuning to introduce new knowledge has flip-flopped several times.

I have read that fine-tuning is not what to use if you want an existing model to learn new information. And that embedding is for this. Instead, fine-tuning teaches new structures.

But now I’m not so sure.

Finally, I asked ChatGPT using the example of teaching an existing model about cities on Mars which would be new to it. It qualified its answer but did say:

‘Fine-tuning a pre-trained model on a new dataset that includes information about cities on Mars could allow the model to learn new knowledge about these cities, such as their name, population, and other characteristics. This new knowledge would be stored in the model’s weights, and the model would be able to use this knowledge to answer questions about the cities on Mars.’

Is this accurate? Can I fine tune with a series of discrete bytes of data (eg facts about cities on Mars) and then latter ask questions about this new data?

Thank you.


I got you, this is hard to find a guide line…

I recommand you to read those two :


If you want a detailed walkthrough on using embeddings for questions about Mars, I would mimic this tutorial.

Basically, you embed all your facts about Mars. Then the incoming question comes in and you embed this. You correlate your incoming question with the entire set of embedded facts. Then based on the top correlations, you pull all the top facts from the database and form a prompt out if this (truncate to fit the limited size of the prompt window). Then you ask GPT-3 to answer the question based on all the top correlated facts in your prompt.

This is probably the best way to extract specific knowledge.

If you fine-tune, it might not be as specific to your facts as you like because you are trying to overcome the noise from the entire set of GPT-3 coefficients (which was trained on the internet, and may not possess your facts).

When it comes to vector databases, you can probably ditch them if you have less than a million embedded facts, but you (or someone helping you) would have to be proficient at database and some amount of coding to achieve this on your own. So don’t get scared away by Pinecone or Weaviate.


I have signed up for the course. Thank you @pinardalec

What you describe @curt.kennedy is where my current thinking is at. Thank you.

1 Like

Interesting discussion. Another point of view might be Codex. Do I prefer building embeddings with my (company’s private) code bases or using fine-tuning?

1 Like

The embeddings vs fine tuning is a great question, I think we should update the docs to give better guidance on this, I will add it to the queue but there’s a lot of other things that need to be updated before that so hang tight.


This is a clear answer and I agree, but to me the elephant in the room is the size constraint placed on the prompt. It limits the context of the prompt and therefore the resources available to answer.


I found this video quite useful in understanding the difference

From my understanding
Fine-tuning is a way to add new knowledge to an existing model. So it’s a simple upgrade, same usage.
Embedding is a way to let the model search in a “database” and return the best result. So it’s good for finding something specific for example.


Hi So just to understand clearly, what this video suggests is that if we want GPT to answer questions pertaining to our domain/custom data which will also lead to minimal or no hallucination, then Embeddings is the way to go over Fine -Tuning. As Fine-tuning will augument the data but GPT will still have access to its training data from WWW.

However Embeddings restrict it to just look for answer in the chunks of text that is embedded via vector db and answer based on that. So it will be more accurate and no hallucinaton.

Is this understanding correct?


Is there an update to doc on the diff between the two? Also is my understanding in the below comment correct?

1 Like

My understanding, and please keep in mind it is only based on my readings in the past couple of weeks, and I am not an expert, is that with finetuning, you are trying to overpower the knowledge the model already has, which is huge. I guess you would need to focus heavily on cities on Mars and fine-tune it with thousands of examples - and as that is a knowledge that the model did not have before, it would probably work in this case? It would be difficult in most other cases, as the knowledge about anything is huge.

I would guess that fine-tuning could also work in a way how it forms a reply - for example fine-tuning it to create replies in the manner of a basketball commentator if that is your segment. Not the content but the sentences, and phrases.

For everything else, I believe embeddings are the way to go.

I’m inclined towards what you suggest.

That said I am still to get around to look at all the useful material people have supplied in this thread. Which I’d like to thank people for.

Too little time and not enough tokens to summarise it all! :upside_down_face:

I have tested this and GPT still makes up contents. It also looks in my embeddings, but not exclusively.
The system prompt may help, like “Only give answers related to XXX”, where XXX is the theme of your embeddings. But if your embeddings are on a very broad topic, GPT still hallucinates (at least in my case).

I am working on a project that has two purposes 1. Deep Domain expertise 2. Limit response to that from authentic sources. This is relevant for domains like Medical or Aircraft engineering. is embedding the way to go ? and in that case I can not pass embedding as API argument every time as vector database may be too big, what would be way out ?

Something that may help.

We’ve been involved in training (fine-tuning) a number of foundational models (open source) using various epochs, last layer training (freezing layers), multi-step (i.e., training once over full model, then training again using a different set of training data), fine-tuning with small and large datasets. We’ve also used vector databases (Weaviate/Pinecone). Our task was to experiment with Constitutional AI. This is what I learned (may be different for others, but thought that’d help here, if at least to open new train of thought).

  1. As a general statement, fine-tuning does tend to match to the desired answer (we have had acceptable results) but in doing so, it can reduce the extent of creative/generative outcomes. In a roundabout way, it feels like you are changing the weights to fit certainty in hallucination. I.e, extending the model’s blindspot to respond in a particular manner.

  2. It is easy to overfit even if you train over only the last few layers, for example 6 layers/4 epochs/90k instruction sets on smaller models.

  3. It takes time and is expensive to get it right and even then we don’t k ow what we don’t know (i.e, which parts of the foundational model is kept or not). For example, training model over coding data in set one was then retrained with conversational data in set 2 and the coding results which were good before, we’re no longer reliable. Training both sets at the same time yielded better results such that LLM + finetuned 1 + finetuned 2 does not equal LLM + Finetuned 1,2.

  4. By embedding to a vector DB with keyword + semantic, our results were significantly better than fine-tuning. By a lot. We first pass the query keywords then extract the clusters, pass it through a prompt and have the model piece the answer together. It doesn’t lose any of its foundational model underpinnings. You can use a combination of sql/vector, it’s just a memory store.

  5. This system can be used for Reinforcement Learnings, adversarial systems, agent and character constructions.

  6. I can see the case for fine tuning models carefully for better performance, but that is highly specialized applications or as a natural accuracy and behavior improvement over time after using the embedding process to reduce data, structure it well and infer behavior.

IMO (and I’m not trying to convince anyone here, just my thoughts), I believe that keeping foundational models as is and tapping in a system of database for memory and prompt engineering for manipulation and output is the right model. Then, in time, refine or fine-tune the models with more sophisticated frameworks and behavioral improvements. But only after a long carefully considered period (that’s our approach - 20% of our teams time is spent on tinkering with training, the other 80% is real progress with current tech).

Hope this helps.


Thank you for the detailed information @james.duchenne and welcome to the community.

1 Like

Just to make sure that I understand your 4) example,

  • you built an embedding DB with your “domain/specific knowledge”
  • you take the user input, embed it and run it through the vector db
  • you collect the response of this query that you use as context for the prompt
  • you send the prompt enriched by the vector DB result to the LLM

Is this what you’re doing?

Additional question. is this “persistent” i.e the more query you use, the more the LLM performs well with your queries?

Thanks in advance

Dominique, at the most basic level, yes, your flow is correct. This alone, however, will not give you improvements (or more accurate answers) with use. For this, you will need to concatenate history with your third step, where relevant, and/or extend the structure to store user feedback (i.e., reward/penalties) to be queried first. The latter can also be used with diversity ratings in fine-tuning operations for private LLMs. Trust this helps.

Thanks. Very useful. When you mention “concatenating history,” are you referring to the ability to reuse past responses and incorporate them into the context for in-context learning? Is there a restriction on the size of the context, such that if it becomes too extensive, the LLM’s comprehension of the question diminishes?