I’ve been having some fun with a Curie model fine tuned on Shel Silverstein’s poems. His style uses fairly simple language, so I thought he’d be a good author to mimic. I used about 200 examples with the poem title as the prompt and the poem body as the completion, so with my fine-tuned model I can just give it a new title that I made up and it composes a poem.
Initial results could get stuck in repetitions. I found that a frequency penalty of 0.35 and temperature 0.7 worked fairly well.
Here’s one called “What Does the Future Hold”:
I’ve heard there’s a pill they’re developing
That will take care of any eventuality:
A pill to make you love your children more,
A pill to make you sleep till noon,
A pill to let you read in the dark,
A pill to turn you into a bird,
A pill that lets you watch TV set–
I’ve even heard it rumored they’re trying it out.
A few more generated poems I liked here: http://dean.dog/shel-silverstein-gpt3/
I like this a lot as I grew up with shel Silverstein! I’ve noticed that gpt-3 has trouble with rhyming schemes, have you seen the same with this fine tuned model?
Can you elaborate on that?
Rhyming does seem hard for the model to pick up. The fine-tuned model occasionally had some rhyming output, but mostly it looks like free verse. Makes it feel a little more modernist I guess.
Finetuning on some poems wouldn’t make a difference to the BPE problems. GPT-3 has already seen vastly more than 200 rhyming poems. 200 more will make little difference. (It has probably seen many of those Shel Silverstein poems already, in fact, as he’s a quotable and popular poet.)
I’ve speculated that if you encoded IPA or rhyming dictionaries, for full finetuning that might be enough to instill phonetics & rhyme knowledge. I’m unsure about cheaper lightweight finetuning techniques: rhyming isn’t a problem where GPT-3 ‘knows’ it already and just needs to ‘locate’ the task, where tweaking a few parameters might switch emphasis appropriately - it’s an absence of knowledge about tens of thousands of words’ phonetics.
Very cool! Where did you get the training corpus?
I read this paper the other day and thought of your work here. The authors use a BERT classifier to determine which model generated content closest to the authors. If you wanted to continue your awesome research here, you could take this idea and perhaps:
- Train a classifier on SS"s work (=0) and the generated work (=1) and then generate more poems and then try to classify them. The closer the classification average is to 0, the better the generator.
- Moreover, you could take this this classification to guide the generator towards generating poems that classify closer to 0.
[Training GPT-2 to represent two Romantic-era authors - challenges, evaluations and pitfalls (kent.ac.uk)]
My apologies if you’ve already read this paper or already had this idea. I found it helpful to me and thought you might enjoy it too.