Hi everyone,
I’ve been fine-tuning GPT-4.1 and GPT-4o using SFT. My goal isn’t to add domain knowledge — I just want the model to follow a specific style and persona. The dataset is small (around 60 training samples + 10 validation), all written in Korean, and the model is also supposed to respond in Korean.
The strange part is that the fine-tuned models behave in ways the base models never do:
-
drifting out of context,
-
giving unrelated or nonsensical answers,
-
repeating phrases or falling into loops (even when I tell it not to),
-
sticking to the previous topic after I change subjects,
-
obsessively mimicking patterns from the training data.
I expected some overfitting with a small dataset, but this feels more like the model is becoming unstable rather than just overfitting — especially since I’m only trying to adjust style/persona, not teach new knowledge.
Before I scale up the dataset, I wanted to ask: is this normal when fine-tuning GPT-4.x with small SFT datasets, especially with non-English data? Or does it sound like something else is going wrong? And is there any known workaround to reduce the looping or context issues?
Any advice or similar experiences would be really appreciated. Thanks.