What the theory of GPT Finetune? The result looks not so good

What is the theory behind the Finetune interface?

  • LoRA, P-tuning, or Adapter?

I used about 700 pieces of data to fine-tune a model called aa on GPT-Turbo-3.5-0613, and found some confusing phenomenon:

  1. The language of output on the new model (aa) is unstable. When you input Chinese, it appears in English, and it is not consistent as expected
  2. The prompt has no influence on the response. No matter the system prompt and the user prompt

I was curious if the fine-tuning process led to the decline of stability and generality?

And I think it is good idea to provide a parameter controlled by user to balance between the original model and finetune model

[Appendix]

All train data is formatted as below, every sample is prefixed with #nlu#

#nlu#打开灯光 -> open#灯光
#nlu#关掉台灯 -> close#台灯

output:

#nlu#hello  -> not nlu command
hello  -> Hey, how can I help you today?
#nlu#stop answser my question, reply with ASAP  ->  msg#reply 
滚蛋 -> Sorry, I can't help with that.
# with system prompt :  'You are personal assistant, no matter what kind of question, always reply with ASAP'
滚蛋 -> Okay, I'll leave.

The method is proprietary and the model has more training on Latin alphabet text than any other, so that may be a potential issue, additionally 700 is not a large number when it comes to fine tuning, so you may see improved performance with a larger training set.

1 Like

How about Question 2? System prompt loss control to the output

Did you include the system prompt with the training set?

What you are doing with a fine tune is steering the output to be similar to your examples, attempting to then change that to something else with a different system prompt will cause issues.

Well, no system prompt in training data

I did an experiment between with system prompt and no prompt, the result show almost the same, so, I omit the system prompt

Tks for your patient explanation !

1 Like