Fine tuning vs. Embedding

Something that may help.

We’ve been involved in training (fine-tuning) a number of foundational models (open source) using various epochs, last layer training (freezing layers), multi-step (i.e., training once over full model, then training again using a different set of training data), fine-tuning with small and large datasets. We’ve also used vector databases (Weaviate/Pinecone). Our task was to experiment with Constitutional AI. This is what I learned (may be different for others, but thought that’d help here, if at least to open new train of thought).

  1. As a general statement, fine-tuning does tend to match to the desired answer (we have had acceptable results) but in doing so, it can reduce the extent of creative/generative outcomes. In a roundabout way, it feels like you are changing the weights to fit certainty in hallucination. I.e, extending the model’s blindspot to respond in a particular manner.

  2. It is easy to overfit even if you train over only the last few layers, for example 6 layers/4 epochs/90k instruction sets on smaller models.

  3. It takes time and is expensive to get it right and even then we don’t k ow what we don’t know (i.e, which parts of the foundational model is kept or not). For example, training model over coding data in set one was then retrained with conversational data in set 2 and the coding results which were good before, we’re no longer reliable. Training both sets at the same time yielded better results such that LLM + finetuned 1 + finetuned 2 does not equal LLM + Finetuned 1,2.

  4. By embedding to a vector DB with keyword + semantic, our results were significantly better than fine-tuning. By a lot. We first pass the query keywords then extract the clusters, pass it through a prompt and have the model piece the answer together. It doesn’t lose any of its foundational model underpinnings. You can use a combination of sql/vector, it’s just a memory store.

  5. This system can be used for Reinforcement Learnings, adversarial systems, agent and character constructions.

  6. I can see the case for fine tuning models carefully for better performance, but that is highly specialized applications or as a natural accuracy and behavior improvement over time after using the embedding process to reduce data, structure it well and infer behavior.

IMO (and I’m not trying to convince anyone here, just my thoughts), I believe that keeping foundational models as is and tapping in a system of database for memory and prompt engineering for manipulation and output is the right model. Then, in time, refine or fine-tune the models with more sophisticated frameworks and behavioral improvements. But only after a long carefully considered period (that’s our approach - 20% of our teams time is spent on tinkering with training, the other 80% is real progress with current tech).

Hope this helps.

9 Likes