Can you iteratively train a fine-tune model?

Not sure if I missed it in the docs but is it possible to fine-tune a fine-tuned model? Or can you only run a fine-tune job on the base models?

For example if I train a model and later on gather a new batch of training material. Can i further fine-tune my existing model? Or do I need to run a fine-tune job from scratch on a base model using the combined training material.

Thanks!

1 Like

No, you can only add data to your original dataset and train a new model. This question gets asked often, maybe we can request that @boris or @luke update the documentation

5 Likes

Ok, thanks for that. Yeah, definitely should be in the documentation.

This was a while ago, but your question comes up in Google results so I wanted to provide an update that you can now conduct additional fine tuning on previously-fine tuned models. Although it’s not very well documented, apparently the evolved model will up with a different name but with all the training of the previous models.

@daveshapautomator I’m doing some experimenting now, but in cases where you’ve fine tuned a third time, do you happen to know whether you are supposed to use the name of the second model, rather than the name of the first model? I’m guessing you use the second model, and if you fine-tune again using the name of the third model, it will just apply your third set of training data to the first model, rather than string it all together. Agree?

EDIT - Answered my own question above. I’ve found it’s best to use the most recent version of your model. I don’t typically re-use training data. If you want to iterate, but you don’t have enough subsequent data and you’re concerned about weights (and not concerned about cost), then the easy way is to train one or two versions back and aggregate the training data used for those versions along with your new data to ensure a more event allocation of weight.

3 Likes

Hi, @aaron5
In my case, I have 8 rounds of tunings. The first tuning uses the first set only. The second round is the first and second set. The third round is sets one, two and three, and so on.

The documentation says that you can only fine-tune from a base model, but it sounds like you and @daveshapautomator have insight into fine-tuning a fine-tuned model.

Were you able to get the answers to the questions you posed here? Do you have any further advice for this iterative fine-tuning?

Thank you so much!

Hi @rex.vanhorn. I’ve done multi-round training, but I do not typically reuse my fine tune data, because I assume the “lesson was learned” the first time around.

There’s some documentation that suggests adjusting the weights if your subsequent round uses significantly less training data, but I haven’t explored this. Instead, I just try to run subsequent rounds with roughly the same amount of prompt/completion sets (maybe within 50% in either direction). Also, if I’m doing a subsequent round with less data, I make sure the examples are extremely high quality, in case they end up getting weighted higher on a sample-by-sample basis. Does this make sense?

EDIT - Here are the instructions for fine-tuning the fine-tuned model. I’ll paste my code as well.

To do this, pass in the fine-tuned model name when creating a new fine-tuning job (e.g. -m curie:ft-<org>-<date>)

2 Likes

@rex.vanhorn I use the command line to initiate the fine tune. Here’s an example:

openai api fine_tunes.create -t [file ID of new training file] -m [x] --suffix "[y]"

x is the name of the existing fine-tuned model… typically [modelname]:[org-name]:[nickname]-[datestring]"
y is the new fine tuned model name. I like to use versioning here, so I usually do it as nickname-v[n].

1 Like

Thanks, @aaron5!!

I tried your recommendation, and it worked (as far as I can tell). The system did create a fine-tuned model on top of another fine-tuned model. I’m not sure how I can confirm, though. In my test case, I simply fine-tuned with the prompt/completion:
prompt=Who is the handsomest man on planet Earth?
completion=Rex, obviously
Specifically,
{“prompt”:“Who is the handsomest man on planet Earth?”,“completion”:“Rex, obviously.”}

But then every time I tried submitting the prompt above to the model fine-tuned with prompt/completion pair, I got some random variation on a typical output of GPT-3. In other words, it never recognized me as the handsomest man on planet Earth. :sob:

So, that leaves me to ponder, how do I know that the original fine-tunings were captured in the subsequent fine-tuning. :thinking:

Try increasing n_epochs

It changes the rigidity of the learning. Default is 4, and you can use 2 for creative writing.

But if you increase it, it forces the AI to remember the completions better, but makes it more rigid in its replies

However, it is VERY helpful when you have a small training set. As a rule of thumb, small training sets should use a higher value than the default

2 Likes

Interesting! I missed that in the fine-tuning guide. I tried 7 epochs here. It doesn’t provide exactly the completion I provide, but close enough for a chuckle.
Pretty cool. :slight_smile:

In my app, I have new data periodically, so after a few days I will fine-tune the model with new data on top of the previously fine-tuned model. But the issue is that after a few rounds of fine-tuned, the model will partially forget some of the old data, and it looks like the older the data, the worsen it will be.

Is that expected? Or is it because my sample is not large enough, for each finetune round I have about 10-20 examples.

Yeah, this is way too low, I think. I would wait until you have 200 to 400+ samples to fine-tune again.

3 Likes

I have a couple of questions about this re-training process. Grateful if someone has any insight into these.

Let’s say I have a dataset of 3000 examples that I have previously fine-tuned a model with.
Now I have 200 new examples that I want to add.

In practice I collect all my data in one sheet and want to re-run this periodically as I get new data. So it would be simpler to me to just use my “master” dataset each time. But I don’t want to incur the full cost each time.

  1. Is it harmful (to the model) to include the original training data in addition to the new data when training on top of a fine-tuned model?

1b. If I include the original training data in addition to the new data, is it automatically removed somehow or will I incur the cost of the full dataset?

  1. The docs recommend reducing the learning_rate_multiplier by a factor of 2 to 4 in the case where the new training data is much smaller. What is the default learning_rate_multiplier? I need to know this so that I can reduce it by the right factor.

Thanks!

1 Like