Here’s an actual example for the JSONL files, something I’m using for another product:
{"prompt": "energetic -->", "completion": " ${p.Sdesc} was always an energetic person. ${p.She} never really knew ${p.her} limits when it came to staying up late or doing something for the first time, and ${p.she} didn't care much about the consequences either.###"}
{"prompt": "lethargic -->", "completion": " ${p.She}'s not sure what ${p.his} problem is, but lately ${p.she}'s just been feeling more and more exhausted. It's a struggle to get up in the mornings and even if ${p.she} does make it out of bed, ${p.she} feels like a zombie.###"}
{"prompt": "talkative -->", "completion": " ${p.Sdesc} chatters incessantly. ${p.She} talks about ${p.her} family, the gossip among the upper class, and the latest trends among the lower. ${p.Her} conversation topics range from what ${p.she} ate for breakfast to ${p.her} latest vacation.###"}
{"prompt": "leader -->", "completion": " ${p.She} had always been the one in ${p.her} family that was able to take charge. ${p.She} made decisions, because ${p.she} knew what needed to be done. ${p.She} never asked for help with anything, and never listened to anyone's opinion but ${p.her} own. ${p.She} didn't see ${p.herself} as a leader, but ${p.she} knew when to step up and take control.###"}
{"prompt": "optimist -->", "completion": " {p.She} is constantly building bridges between two sides. ${p.She} never takes sides and always tries to find the good in everyone. ${p.His} optimism and positivity has made ${p.him} popular.###"}
{"prompt": "emotional -->", "completion": " ${p.Sdesc} was always passionate about everything ${p.she} did. ${p.His} emotions were always close to the surface. ${p.She} was never afraid to express ${p.his} feelings, even if it meant getting into arguments with others. ${p.She} was always fiery and intense, both in ${p.her} words and ${p.her} actions.###"}
{"prompt": "hot-blooded -->", "completion": " ${p.She} is always the life of the party - always ready for a good time, and always up for a fight. ${p.She} is passionate about everything ${p.she} does, and ${p.she} doesn't hold back when it came to expressing ${p.his} feelings.###"}
{"prompt": "warm -->", "completion": " ${p.She} never seems to stop being happy. ${p.She} is always smiling, never too tired for a laugh.###"}
{"prompt": "chatty -->", "completion": " ${p.Sdesc} chatters incessantly. ${p.She} talks about ${p.her} family, the gossip among the upper class, and the latest trends among the lower. ${p.Her} conversation topics range from what ${p.she} ate for breakfast to ${p.her} latest vacation.###"}
Note that the prompt has an ending -->
. The completion has a space. The end of the completion has a ###
which I’m using as a stop.
You can use the tool openai tools fine_tunes.prepare_data -f <LOCAL_FILE>
to check the correctness of your inputs.
I personally prefer to directly write the jsonl file because I have more control, and I find the plaintext easier to read than excel cells.
After that, pick the tuned model and run it with a prompt, e.g.
fearless -->
curie output: ${p.She} never backs down from a fight. ${p.She} is always ready to stand up for what’s right, even if it means standing up to people who are better than her.
davinci output: ${p.She} never backed down from a fight, no matter how bad it seemed. ${p.She} never let ${p.his} fear stop ${p.him} from doing what ${p.she} thought was right. ${p.She} never worried about what was going to happen, only about what was happening right now.
There is a difference in quality, but it seems quite minor in this situation. Also apparently quality goes up every time you double the number of samples… you can even use it to generate samples if you like and then fix the output to your liking.