Fine-tuned models overfit to the fine-tuning data during text generation?

Overview

I recently fine-tuned a curie model with around 6000 reviews on the airlines and restaurants. I would like to let the model generate reviews on other services (like clothes, customer electronics, etc.) to augment the training data for domain adaptation.

I spent quite some efforts on filtering the fine-tuning data. I believe they are simple enough for a the curie engine to comprehend. So my fine-tuning data looks like following:

{'prompt': 'A positive review on seat\n\n###\n\n',
  'completion': ' This seat was fantastic. ###'},
 {'prompt': 'A positive review on service\n\n###\n\n',
  'completion': ' That service is adorable. ###'},
 {'prompt': 'A positive review on airline\n\n###\n\n',
  'completion': ' That airline was awesome. ###'}]

Issues

When I tried to generate reviews on other products, the topics remain in the same domain as fine-tuning data. For example, for a input prompt "A positive review on laptop." (where I explicitly asked to model to generate reviews on laptop), the followings are returned by the model.

To avoid repetition, I set both Temperature and Top P to 1. I also set Frequency penalty to 0.5.

 That service was wonderful. ###

 That was an exciting pilot. ###

 We like that flight. ###

 I hate that pilot. ###

 I appreciate the customer service. ###

Questions

I am suspecting the model here overfits to the fine-tuning data and there is catastrophic forgetting. I am not sure how to resolve this issue. I know one straightforward way is to fine-tune another model with more diverse fine-tuning data. But this is not really feasible in my case.