A jsonl file consisting of 500 prompt->completion pairs as follows:
{“prompt”: “Season=summer, Silhouette=sheath”, “completion”: “You will want to wear this gorgeous sheath silhouette dress everywhere this summer season!”}
{“prompt”: “Sleeves=three-fourth-sleeves, Detail=frill, Silhouette=fitted, Neckline=sweetheart”, “completion”: “It has a sweetheart neckline, three-fourth-sleeves, a fitted silhouette frill detail which will turn all heads to you!”}
Then I fine-tune a model with the file containing such prompt-> completion tags using the CLI as follows
openai api fine_tunes.create -t ./generation.jsonl -m curie
Now when I use the model for generation of sentences given prompts, if the model has seen a combination during training, it notices that combination and provides a very similar sounding sentence.
The example you linked to shows that your format won’t work as well as the natural language format shown below.
You’re missing a few things that the tool should recommend you. (separator at the end of the prompt, space as the first character of your completion, end token…)
Thanks for your reply. It was indeed helpful in getting me started. I couldn’t access openai tools ... but I incorporated the suggestions which you gave me.
I included separator -> at the end of prompt
I added space to be the first character of the completion
However, I was wondering if there’s any other preprocessing step that I should apply since fine_tunes.prepare_data is not working…
Alternatively, how can I make the prepare_data command work?
Finally, why are the generated sentences incomplete? Do I need to change any parameters except temperature while calling Completion.create like frequency_penalty or presence_penalty?
Which error message do you get when you say it’s not working?
You need to have the latest version of the python library installed, which should come with the CLI tools. You can do this by
pip install --upgrade openai
Sometimes it may not work if you do this in a virtual environment, or without having admin permissions.
I would reduce the temperature for generations - that’s more likely to stick to the format and complete the sentences. Also putting a space before → might help a tiny bit. It also looks like you didn’t transform the prompt to natural language instead of the = sign, as suggested in the example you referenced.