The demarc token you use could be causing degradation in performance. Even whether or not you use a space after the demarc can cause strange performance changes.
Here are a couple example prompt/completion pairs. Would you suggest I rerun a training with any changes to the prompt/completion pair structure?
{"prompt":"Input: An ebook on how to use CRM to manage your team more effectively\nOutput:","completion":" From Chaos to Customer Management in 30 Days or Less END"}
{"prompt":"Input: An ebook on how to use CRM to manage your team more effectively\nOutput:","completion":" You're not managing your team effectively if you're not using CRM END"}
I saw that when including the prompt in every line, the performance seems to suffer.
Include the prompt into the normal request and fine-tune only the unique parts.
But don’t listen to me, since I’m quite a novice, still. Instead, listen to @daveshapautomator. He’s great!
EDIT: Maybe try to generate 100s of variations of the phrase “An ebook on how to use CRM to manage your team more effectively” on the playground and then use those as the Input prompt for the fine tuning.
But from my perspective this doesn’t seem like a task that needs fine-tuning. Coming up with catchphrases is already a task that Davinci does extremely well. Finetuning just clamps the creativity down (it becomes deteministic), unless you have thousands of unique, curated, examples.
Did you use lesser epochs?
I’ve never tried to fiddle around with the learning rate multiplier also. What did it do and which config did you use (for how many examples) ?
Yes, I tried it with fewer epochs. Total prompt/completion pairs: 508.
The default number of epoch is 4 epochs, which didn’t give me great results. I ran it with 1 epoch and 2 epochs. 1 epoch gave slightly better results for creative text. Both 1 & 2 epochs gave better results than 4 epochs for creative text.
I also tried adjusting the prompt learning rate, but I don’t have anything definitive for the results from that. But, I only tried setting it manually at .02, and .05.They both produced good results.