Should prompts be unique for fine-tuning?

jordanarsenault · September 16, 2022, 11:45am

I created a fine-tuned Davinci model with 508 prompt/completion pairs. It was based on synthetic data I created using the Davinci model.

However, I find the quality of outputs is worse with my fine-tuned model than with the regular Davinci model.

I have a lot of duplicate prompts in my prompt/completion pairs (not duplicate completions).

There are 13 different prompts, all with different completions.

Would this be causing it to perform worse?

daveshapautomator · September 16, 2022, 4:10pm

The demarc token you use could be causing degradation in performance. Even whether or not you use a space after the demarc can cause strange performance changes.

jordanarsenault · September 16, 2022, 4:34pm

Here are a couple example prompt/completion pairs. Would you suggest I rerun a training with any changes to the prompt/completion pair structure?

{"prompt":"Input: An ebook on how to use CRM to manage your team more effectively\nOutput:","completion":" From Chaos to Customer Management in 30 Days or Less END"}
{"prompt":"Input: An ebook on how to use CRM to manage your team more effectively\nOutput:","completion":" You're not managing your team effectively if you're not using CRM END"}

Fusseldieb · September 16, 2022, 4:44pm

I saw that when including the prompt in every line, the performance seems to suffer.
Include the prompt into the normal request and fine-tune only the unique parts.
But don’t listen to me, since I’m quite a novice, still. Instead, listen to @daveshapautomator. He’s great!

EDIT: Maybe try to generate 100s of variations of the phrase “An ebook on how to use CRM to manage your team more effectively” on the playground and then use those as the Input prompt for the fine tuning.

But from my perspective this doesn’t seem like a task that needs fine-tuning. Coming up with catchphrases is already a task that Davinci does extremely well. Finetuning just clamps the creativity down (it becomes deteministic), unless you have thousands of unique, curated, examples.

jordanarsenault · September 16, 2022, 7:01pm

Thanks for the ideas @daveshapautomator & @Fusseldieb.

I ended up rerunning the same training with some variations to the epochs and learning rate multiplier, and the results are much better now.

Fusseldieb · September 18, 2022, 1:42am

Interesting!

Did you use lesser epochs?
I’ve never tried to fiddle around with the learning rate multiplier also. What did it do and which config did you use (for how many examples) ?

aaron5 · September 18, 2022, 3:12am

Why did you add a space in front of the completion results? Just curious.

jordanarsenault · September 18, 2022, 11:50am

Yes, I tried it with fewer epochs. Total prompt/completion pairs: 508.

The default number of epoch is 4 epochs, which didn’t give me great results. I ran it with 1 epoch and 2 epochs. 1 epoch gave slightly better results for creative text. Both 1 & 2 epochs gave better results than 4 epochs for creative text.

I also tried adjusting the prompt learning rate, but I don’t have anything definitive for the results from that. But, I only tried setting it manually at .02, and .05.They both produced good results.

jordanarsenault · September 18, 2022, 11:54am

I ran the Python fine tuning data preparation tool on my data set, and it recommended adding a space.

It has to do with how they tokenize words that have a whitespace before them.

Here is more info from the docs. OpenAI API

Topic		Replies	Views
Using multiple identical prompts with unique completions Prompting	2	686	December 20, 2023
Got awful results after fine-tuning API	11	3209	December 1, 2022
Fine Tuning text completion model with Davinci-002 using blank prompts API fine-tuning , fine-tuning-problems , fine-tune	2	524	February 29, 2024
Trying To Fine-Tune To Overcome Prompt Size Limit API	4	1446	December 17, 2023
Adding prompt info to fine-tuning API	14	3111	December 25, 2023

Should prompts be unique for fine-tuning?

Related topics