I am working with a university professor on a thesis project in which I fine-tune GPT-3 to answer doctrinal questions in a religious domain. The questions came up:
What exactly happens to GPT-3 during fine-tuning? E.g., are all the weights updated? Or only a subset of them? How are they updated? In a way similar to how the model was initially trained? Or differently? With the same optimization schema, but different hyperparameters? Or in some other way?
I’ve read through as much documentation I can find but don’t have a complete, satisfactory answer. I wonder if someone could point me in the right direction.
I’m not an expert, so please take this with a grain of salt, but based on my experience working with OpenAI’s CLIP, fine-tuning pre-trained OpenAI models works via linear probing. Linear probing is a technique where you take the second-to-last layer of a NN (so the layer before the output layer) and further tune the weights from the base model using your datasets.
Looking at the request body for the OpenAI API for fine-tuning, it seems you could also update the hyperparameters if you wanted to, although I’m not sure if that applies to the training done on the base model, the additional layer tuned, or both.
I’ve also been looking for a more thorough answer, so please share if you find it!
Total admitted rube here, but instinctually your suggestion that the tuning impacts the second to last layer suggests a certain level of superficiality in tuned models, a concept which piqued my curiosity. If you can refer me to a resource that talks about NN or the other ideas you referred to I would be appreciative.
I’ve dealt with OpenAI advisors before. Their response was that ALL the weights are updated during fine-tuning, not just the final layers.
I’m assuming they are updated in a similar fashion to the initial training.
Hyperparameters can be provided at the time of fine tuning.
Thank you for your response. My hope would be that your explanation is correct, and that all the weights are updated during the tuning process. This would deeply incorporate the tuning data into the model, and confer the data the same learning status as all the data on which the model is initially trained. That’s probably not a big deal, except that it gives us the ability to continue improving GPT-3 where OpenAI left off.
It does also, however, introduce a bit of a pickle, too. Even if I fine-tune OpenAI on 1GB of text data, which is a lot of data, that would be a mathematical drop in the bucket compared to the other 520GB of data that already exists in the pre-trained GPT-3 model. The concern then becomes, what is the probability of even influencing GPT-3 with a reasonable amount of data. My understanding is that all of Shakespeare’s written works is only about 3.5 million characters. This means that the base training of GPT-3 is approximately 73 times more than all of the Bard’s works. As that is the case, if we wanted to influence GPT-3 towards specific answers, how would that even be possible, without a massive amount of data?
As a parting question, do you happen to have a source or reference documenting that all the weights are updated? I trust you, but my major professor is more skeptical.
The more I look at it, I’m heading down the path of an embedded database within a limited domain and then using for to do the final question and answer with the context of the text I find in a semantic search of my own embedded database
Also can anyone shed light on the relevance of splitting the prompt and completion vs just having a completion when it comes to training
Domyou think open ai weigh the two halves differently. On my mind they are just one long continuation or context
I’ve found posts of others’ experimentation which lead me to believe that GPT-3 compares the entered prompt with the fine-tuning prompts, and if it finds that the prompt matches or is close, then GPT-3 responds with the submitted, fine-tuning completion. I really hope that’s not the case. That seems overly simplistic and almost not really useful (as the same function could be performed more efficiently on the client/submission).
But, others have mentioned that fine-tunings without the prompt does seem to actually post-train (?) the model, and at least modify some of the weights. But that’s really all I can contribute.
So my hypothesis did not bear out. I fine-tuned GPT-3 on one question prompt and one answer, and no matter what I try, I cannot get GPT-3 to give me back that specific answer. So, I assume that the fine-tuning must alter the weights in some way.
Hot off the presses! Thanks to some help from @aaron5 and @raymonddavey, who pointed me towards fine-tuning on top of fine-tunings, and to play with the training epochs, I did get the model to spit out a facsimile of the submitted completion. This result still supports the general conjecture that fine-tuning does in fact change the weights. I would also say that this result does, too, reject the hypothesis that there is a checking mechanism that simply looks for submitted prompts in the fine-tuning data and mindlessly returns the fine-tuned completion.
As an aside, I remember seeing months ago and spot on the OpenAI community page where normies like us (or at least normies like me) could schedule some time with the experts. Now that I have a slightly better idea of what I’m doing, I would be interested to talk to an OpenAI expert.
Does anyone know if we can still do that?