I have been working on my bachelor’s thesis, which involves fine-tuning a GPT-3.5 Turbo model to do a specific task.
After fine-tuning the model, I plan to present it with prompts that include not only information from the training data but also new types of information not covered in the training data.
Specifically, I’m curious about the following:
Dependency on Training Data: How dependent are the model’s responses on the structure and content of the training data? Can the model effectively handle prompts that introduce new elements for example system information, even if it is not included in the training data?
Flexibility in Prompt Usage: If my training data didn’t include specific system information, can I still introduce it in my prompts when using the fine-tuned model?
Handling XY Cases and Providing Hints via Prompts: Is it possible to guide a fine-tuned language model during TASK YZ? If so, how can this be achieved by merely modifying the prompt?
Fine-tuning just adds another layer of “neurons” to the model, without significantly changing already existing weights.
So the fine-tuned model will react to the new information just like a non-fine-tuned one. Unless you tune it to react on a nw information in a specific way.
Thank you for your response. I understand your point about how a fine-tuned model reacts to new information similarly to a non-fine-tuned one, unless it’s specifically tuned to respond to new types of information in a certain way.
However, I need clarification on your statement: “Fine-tuning just adds another layer of ‘neurons’ to the model.” From my understanding, a neural network comprises layers and neurons, including input, hidden, and one output layer. Your answer suggests that we slightly tweak and change the weights. But when you mention “adding another layer,” does this really imply the actual addition of new layers including neurons?
Could you elaborate on what happens to the output layer during this process? I would greatly appreciate it if you could provide academic papers or sources that offer a deeper understanding of what occurs during the fine-tuning of ChatGPT.
A detailed, scholarly explanation in my thesis is essential. I am truly thankful for your response, as it guides me on areas where I need to delve deeper. I’ve already searched the forum and found this thread, but it doesn’t quite satisfy the academic depth I am seeking.
"Fine-tuning just adds another layer of ‘neurons’ to the model.” - this was just a metaphar. As I understand, what fine-tuning does, is it changes some of the weights of the model (but in most cases not very significantly).
But there are people on the forum who are WAY more technical in this question that me, so let’s wait for more comments
Btw, this link may be helpful in your research: LLM Visualization