Completion giving unintuitive results

Hello everyone.

I have been using the openai API for building a product description given a set of tags recently, I am observing the following.

Inputs

A jsonl file consisting of 500 prompt->completion pairs as follows:

{“prompt”: “Season=summer, Silhouette=sheath”, “completion”: “You will want to wear this gorgeous sheath silhouette dress everywhere this summer season!”}
{“prompt”: “Sleeves=three-fourth-sleeves, Detail=frill, Silhouette=fitted, Neckline=sweetheart”, “completion”: “It has a sweetheart neckline, three-fourth-sleeves, a fitted silhouette frill detail which will turn all heads to you!”}

Then I fine-tune a model with the file containing such prompt-> completion tags using the CLI as follows

openai api fine_tunes.create -t ./generation.jsonl -m curie

Now when I use the model for generation of sentences given prompts, if the model has seen a combination during training, it notices that combination and provides a very similar sounding sentence.

However when I pass a random set of tags from our vocabulary, I see this.

sentence = openai.Completion.create(
            model="curie:ft-user-wjlwpajo0xur0ryehdgqhtua-2021-08-19-13-02-26",
            prompt='Occasion=party, Silhouette=aline', top_p = topP).choices[0]["text"]
print(sentence)

# Output is as follows:
', Print=camo, Neckline=sweetheart, Fabric=knitYour'

Can anyone suggest why this would be and how should one go about overcoming this problem.

Thanks!

1 Like

For the same prompt as above, varying temperature from 0 to 1 in intervals of 0.1 gave me this.

prompt='Occasion=party, Silhouette=aline'

Thanks!

2 Likes

Hi vahuja,

The example you linked to shows that your format won’t work as well as the natural language format shown below.

You’re missing a few things that the tool should recommend you. (separator at the end of the prompt, space as the first character of your completion, end token…)

Try running
openai tools fine_tunes.prepare_data -f <your_file>

This will reformat the file for you, and suggest adding all the things mentioned above (apart from the natural language format).

Let me know if this works for you

4 Likes

Since you didn’t include the prompt separator, your model just continues with predicting other possible properties.

2 Likes

Hey @boris

Thanks for your reply. It was indeed helpful in getting me started. I couldn’t access openai tools ... but I incorporated the suggestions which you gave me.

  1. I included separator -> at the end of prompt
  2. I added space to be the first character of the completion
  3. I added END token to every completion.

The results seem to come out better

However, I was wondering if there’s any other preprocessing step that I should apply since fine_tunes.prepare_data is not working…

Alternatively, how can I make the prepare_data command work?

Finally, why are the generated sentences incomplete? Do I need to change any parameters except temperature while calling Completion.create like frequency_penalty or presence_penalty?

Thanks a ton!

1 Like

Which error message do you get when you say it’s not working?

You need to have the latest version of the python library installed, which should come with the CLI tools. You can do this by

pip install --upgrade openai

Sometimes it may not work if you do this in a virtual environment, or without having admin permissions.

I would reduce the temperature for generations - that’s more likely to stick to the format and complete the sentences. Also putting a space before → might help a tiny bit. It also looks like you didn’t transform the prompt to natural language instead of the = sign, as suggested in the example you referenced.

2 Likes