Fine tuned with wrong data initially


I have an issue and wonder if anyone else experienced this. I finetuned my model with a wrong set of data initially. In the finetuned data, the assistant simply replied with the exact prompt.

After realizing my mistake, I deleted the wrong fine tuned model, fixed the jsonl and finetuned again, this time on correct data.

The fine-tune data is giving a very specific style of giving one paragraph summaries to articles.

However, when I use the new model, trained on correct data, it behaves as if its the old model and simply parrots the input again.

Any advice of what to do now?

Could you please provide detailed information on the process you use to update the training data?

Like which API is calling for updated data and train the fine-tuning model.

So that we guide you properly

1 Like

I am not a programmer, so not sure if I understand correctly the question, but let me try and answer.

I fine-tuned using the new fine-tuning feature for GPT 3,5 model. I simply used the open AI guide and colabs to validate the data and then fine-tune.
I also used the guide delete the flawed fine-tuned model once I realized I gave it bad data.

And I created another fine -tune as a new job with the correct data.

My data has a very simple format, with only one system prompt, one user prompt and one answer.

Then what is issue in this

The issue is that the fine-tuned model on correct data behaves like the fine-tune model in the wrong data.

The first model, because in the data the prompt was duplicated as the assistant response, it output exactly that when promoted: it repeated the prompt.

The corrected data had the correct prompt (which is an article) and the correct assistant response (a summary). I trained a new model based on the corrected data.

But the new corrected fine-tuned model, when prompted, simply parrots the user prompt instead of providing a summary. It behaves exactly like the old, flawed model, which I deleted.

Let me put it in a rather flowery way: if you take a cup of coffee and first pour bad milk into it and then pour in good milk then you have to pour in a lot of good milk until you no longer notice the bad milk.
It would be better to start over with a fresh coffee.


:))) cool metaphor. I understand. I did create a new fine-tune model. I did not fine-tune the flawed one and, as far as I know, it’s not currently possible. I even deleted the flawed, old model.
Am I missing something about how to start anew?

1 Like

Nah, I think you are good to go. Take the new one. Give it the good data and there is no way for the model to give you information of the previously inserted wrong data. It wasn’t used for training.

What model are you using, and what data format are you providing it?

Can you share a line of training, and the prompting you are giving the AI.

If this is how I train gpt-3.5 with examples in the needed format:

{"messages": [
{"role": "system", "content": "You are Awesome-O!"}, 
{"role": "user", "content": "Hey Awesome-O, can you be my friend?"}, 
{"role": "assistant", "content": "(weak, weak!.)"}

(line feeds shown not permitted)

Then my roles in use must reflect that same training style to solicit my trained performance:

messages = [
{“role”: “system”, “content”: “You are Awesome-O!”},
{“role”: “user”, “content”: “Sing a song about being my best friend!”}

This won’t cut it:

messages = [
{“role”: “system”, “content”: “You are ChatGPT, a large language model…”},
{“role”: “user”, “content”: “Sing a song about being my best friend!”}

Completion models like davinci-002 require a particular prompting style, and you must already be familiar with how to solicit output from them in order to move forward with custom results. They like to repeat and train themselves on their own output, hence, a repetition penalty.

1 Like

Thanks. This might be indeed an issue, because I did not keep the exact same system message.

I’ll try again with the exact system message and if it does not work, I’ll share exact training data.

1 Like

Just an update on this: the issues were caused by my own error, as the “repaired” data still had some issues somewhere in the middle which I had not seen.