I’m working on a project that will use GPT-3 to read products review and summarize it into 1 holistic product review.
So far I got OK results using prompt engineering and I was about to start to prepare a dataset for fine-tuning when a fellow developer suggest it may not help and even worsen results.
His background for this claim was that product reviews are very common and GPT-3 for sure was already training on lots and lots of product review data points. So it’s better to concentrate on prompt engineering, or maybe try n-shot.
That is, of course, will save some cash on creating the dataset, but I’m still not convinced about that approach.
What’s your stand on fine-tuning in this use case?
I’m not worried about the cost of the actual training, it’s dataset preparation that will take time, effort, and cost. It’s my personal project, I don’t have a company behind me, so even 1K$ will be meaningful.
To get a good dataset, i need professional blog writers to read a few hundred review articles, then write their own summary of them according to the subtitles.
I would suggest engineering a few shot Davinci prompt that generates amazing results some of the time.
Use all of the best results as synthetic data to create your training dataset.
First try fine tuning Curie with your dataset to evaluate the output quality.
If output is not up to scratch then fine tune a Davinci model.
At LitRPG Adventures, I’ve fine-tuned on around 4,000 character backstories to good effect. Those backstories were generated by GPT-3 and lightly edited by me. With the cut to the cost of normal Davinci, I’ve stopped using the fine-tuned model for now as it’s more expensive than giving Davinci a couple examples (ie a 2-shot prompt…)
So… yes, GPT-3 can be useful to generate fine-tuning data, but you want to be careful that you’re not falling prey to “garbage in / garbage out…” As in, the better content you feed the model for your fine-tune, the more accurate and useful it will be. It’s sort of a balancing act…