Any experiences of tips for continuing from previous finetune for specific purpose?
EDITED EXAMPLE:
First finetune: write social media posts for any social media
Next finetune needed: write LinkedIn posts
When to continue from first finetune, when to start from scratch.
Old was: Let’s say my first finetune is trained with Wikipedia pages about animals. Next I want to create a finetune about cat breeds.
Is it an advantage or disadvantage to continue from the 1st finetune?
I’d expect the 1st finetune has lost some generalization ability. I am afraid is has lost more of generalization than is necessary.
However, it is already trained on animal related topics, and I want to narrow it down further. So perhaps I will be able to use the animals as base model to training for faster?
Fine-tuning under OpenAI’s fine-tuning endpoint is not designed to inject new knowledge into the model. Therefore, for your particular case, it would not be an appropriate solution to begin with.
You are correct, so let’s change example to:
First finetune: write social media posts for any social media
Next finetune needed: write LinkedIn posts
Q: When to continue from first finetune, when to start from scratch?
The example itself is irrelevant. I am trying to understand are there use cases where we can continue from previous finetunes for new tasks.
As a rule thumb, I’d say that if conceptually you are trying to achieve the same and the general expectations for the style and format of the output are fairly similar, then it can make sense to continue.
But the approach to take for the second iteration of fine-tuning may differ case to case. For example, if your goal is to narrow it down to just LinkedIn posts, it might be good enough to just add additional training examples focused on LinkedIn posts only. If you want the model to still be capable to create posts for LinkedIn and other channels, then I would in your updated data set include training examples for both cases.
In my mind, it is very much a case by case decision. There’s no hard and fast rule for one or the other approach.
Of course, if your existing model is already underperforming, then I would in almost all cases start with a fresh fine-tuning as opposed to trying to compensate the existing underperformance with new and different training examples.
That is a great point. If I already know I need to train for LinkedIn I can do it at the same time with different system prompts. Or, the continued finetune could just include more Linkedin examples, along with the old ones.
Possibly I may have been thinking too much along the old ML models thinking with one model, one task, always refit for even slightly new case. LLM’s are general so perhaps the finetunes should be also as generalized as possible…?
I have lots of old specialist finetunes, but maybe I shoudl try a “super-finetune” with all my use cases included ? Just change the system prompt for each case. Any ideas of disadvantages for it?
let’s say one finetune for both writing blog posts, email, social media, each with 5 different styles but also doing logical math reasoning etc (again just made up examples). Could it become a multi-specialist? Or is still better to focus on one task for each finetuning (maybe due to fixed low rank of finetunes etc.)
I think that at the core, the task should still be the same for the fine-tuned model in question. Trying to fine-tune for multiple different tasks would be counter-intuitive to the objective of fine-tuning. So in the social media post example, it would fundamentally be about writing posts catered to different audiences and circumstances.
But like you suggest, ideally you can plan ahead and when you start the fine-tuning process you design your prompts in a way that they could be expanded to cater to different cases.