Overview
I recently fine-tuned a curie
model with around 6000 reviews on the airlines and restaurants. I would like to let the model generate reviews on other services (like clothes, customer electronics, etc.) to augment the training data for domain adaptation.
I spent quite some efforts on filtering the fine-tuning data. I believe they are simple enough for a the curie
engine to comprehend. So my fine-tuning data looks like following:
{'prompt': 'A positive review on seat\n\n###\n\n',
'completion': ' This seat was fantastic. ###'},
{'prompt': 'A positive review on service\n\n###\n\n',
'completion': ' That service is adorable. ###'},
{'prompt': 'A positive review on airline\n\n###\n\n',
'completion': ' That airline was awesome. ###'}]
Issues
When I tried to generate reviews on other products, the topics remain in the same domain as fine-tuning data. For example, for a input prompt "A positive review on laptop."
(where I explicitly asked to model to generate reviews on laptop), the followings are returned by the model.
To avoid repetition, I set both Temperature
and Top P
to 1. I also set Frequency penalty
to 0.5.
That service was wonderful. ###
That was an exciting pilot. ###
We like that flight. ###
I hate that pilot. ###
I appreciate the customer service. ###
Questions
I am suspecting the model here overfits to the fine-tuning data and there is catastrophic forgetting. I am not sure how to resolve this issue. I know one straightforward way is to fine-tune another model with more diverse fine-tuning data. But this is not really feasible in my case.