What exactly and technically happens with fine-tuning?

Hi, @marais
Thank you for your response. My hope would be that your explanation is correct, and that all the weights are updated during the tuning process. This would deeply incorporate the tuning data into the model, and confer the data the same learning status as all the data on which the model is initially trained. That’s probably not a big deal, except that it gives us the ability to continue improving GPT-3 where OpenAI left off.

It does also, however, introduce a bit of a pickle, too. Even if I fine-tune OpenAI on 1GB of text data, which is a lot of data, that would be a mathematical drop in the bucket compared to the other 520GB of data that already exists in the pre-trained GPT-3 model. The concern then becomes, what is the probability of even influencing GPT-3 with a reasonable amount of data. My understanding is that all of Shakespeare’s written works is only about 3.5 million characters. This means that the base training of GPT-3 is approximately 73 times more than all of the Bard’s works. As that is the case, if we wanted to influence GPT-3 towards specific answers, how would that even be possible, without a massive amount of data?

As a parting question, do you happen to have a source or reference documenting that all the weights are updated? I trust you, but my major professor is more skeptical.