Thank you for your fast answer @_j .
I agree a “small dataset” is not enough for the good fine-tuning, but it should be enough to give me a glimpse of what is going on under the hub, since so tiny level of control is provided with the current API.
You mentioned “Otherwise your 100 would just be added to their 10 million”. However, I think the exact opposite happened for me, and the fine-tuning just used my hundreds of examples to fine-tune the whole model (not just the last layers for example).
Let me be more concrete. Tor the training/validation process I have provided ~100 different examples of the same-topic conversation between two actors (their character definition is in the system prompt). After the fine-tuning (7 epochs) I have tested the model. If I follow the script (screenplay), the model behaviour is decent. If, in the middle of the conversation, “user” actor make an abstract segue, and asks “What is the meaning of the life?” (similar sentence was not defined in the training dataset), the other actor’s answer will be generated to follow the expected conversation route at that exact place, and will totally omit that abstract question (even though there was a line in the system prompt saying: “not to blindly follow the script, but to talk naturally”).
Does that mean that I have to include as many general and abstract questions in my training/validation data, to be able to generalize of the model’s abilities? How is all of that general knowledge overridden from the basic, pre-trained model? And why the system prompt itself is not helping with this?