Generating a list of fake quotes

I have a big database of quotes and I was hoping to use GPT-3 to generate a bunch of fake quotes that are similar in style to a certain person. So for example, I could feed in 800 quotes from Bob and have the system generate 10 quotes that sound like something he would say. This is for a quiz-type thing where I want to have people guess if a quote is something the person really said, or if it is generate by the AI. From what I understand, this is the kind of thing that GPT-3 should be able to do, but I am having a lot of trouble figuring out how to 1. train the system on the example quotes and 2. generate quotes like the examples.

I tried creating a fine tuned model with my quote database, submitting a file like this:

{"prompt":"","completion", " <quote>"}
{"prompt":"","completion", " <quote>"}
etc.

but that just seemed to completely mess up the model and made it output paragraphs of nonsense.
I also tried to create a fine tuned model with specific prompts like this:

{"prompt":"Quote from Bob","completion", " <quote>"}
{"prompt":"Quote from Bob","completion", " <quote>"}

but the fine tuning tool didn’t like having a bunch of identical prompts.

I also tried just submitting a smaller selection of quotes as a prompt, in the form:

Quotes from Bob:
1.<quote>
2.<quote>
3.<quote>
....
50. <quote>
10 more quotes from Bob:

but the system either just directly copied some of the example quotes or gave me a single generic quote. Also this method obviously uses a ton of tokens for every submission.

I feel like this shouldn’t be that hard a problem, but I’m stumped. Any suggestions on how to train the model and/or how to structure the prompt to get the results I’m looking for?

In terms of prompt design, I am assuming that you are using the new Instruct Series engines. If that is the case, Explicit instructions are very helpful.
I am just modifying your prompt to match the ones that empirically help. However, some helpful tips are below the prompt.

Generate 10 new quotes matching the tone and tenor of the following quotes from Bob.
Quotes from Bob:
 1.<quote>
2.<quote>
3.<quote>
....
50. <quote>

10 New quotes:
1.

2 more points:

  1. Set frequency and presence penalty to higher values.
  2. Assuming that this is not a manual (playground-based copy and paste) affair; you do NOT need to supply 50 quotes. Just sample 5 randomly to generate 1 or 2 new quotes, and do that repeatedly as many times as you want.

Hope that helps.

That prompt format seems to be much more successful. The idea of sending a random subset of seed quotes with each request makes a lot of sense, as well.
Thanks a bunch!

Glad I could help you.
Please do not ignore setting the frequency and presence penalties to higher values. I have had success at 0.5-0.6. You can use those as starting points.

out of curiosity, was I totally on the wrong track in terms of doing it with the fine tuning? My reasoning was that I would want the largest possible selection of quotes to do the “training”

I don’t think that you were “totally” off-track. However, 50 quotes just aren’t enough to give you a meaningfully fine-tuned model. My diagnosis is that the model badly overfit your data, and hence repeated the texts verbatim.

Another possibility is that your prompt to the FT model was not appropriate. From the guide

To use such a model you can write a few starting words of the haiku, and let the model complete the haiku. You could also let the model generate new haikus by increasing the temperature, and sampling from the model with an empty prompt. Use the stop sequence END during inference, to ensure the haikus end in the right place.

First few words of the required quote need to be provided. so either your text needs to have a “quote from bob”, quote from Tom" prefixed in each instance; or you would have to write the first few words yourself.

1 Like