dean
1
I’ve been having some fun with a Curie model fine tuned on Shel Silverstein’s poems. His style uses fairly simple language, so I thought he’d be a good author to mimic. I used about 200 examples with the poem title as the prompt and the poem body as the completion, so with my fine-tuned model I can just give it a new title that I made up and it composes a poem.
Initial results could get stuck in repetitions. I found that a frequency penalty of 0.35 and temperature 0.7 worked fairly well.
Here’s one called “What Does the Future Hold”:
I’ve heard there’s a pill they’re developing
That will take care of any eventuality:
A pill to make you love your children more,
A pill to make you sleep till noon,
A pill to let you read in the dark,
A pill to turn you into a bird,
A pill that lets you watch TV set–
I’ve even heard it rumored they’re trying it out.
A few more generated poems I liked here: http://dean.dog/shel-silverstein-gpt3/
7 Likes
I like this a lot as I grew up with shel Silverstein! I’ve noticed that gpt-3 has trouble with rhyming schemes, have you seen the same with this fine tuned model?
1 Like
Can you elaborate on that?
1 Like
dean
5
Rhyming does seem hard for the model to pick up. The fine-tuned model occasionally had some rhyming output, but mostly it looks like free verse. Makes it feel a little more modernist I guess.
2 Likes
gwern
6
Finetuning on some poems wouldn’t make a difference to the BPE problems. GPT-3 has already seen vastly more than 200 rhyming poems. 200 more will make little difference. (It has probably seen many of those Shel Silverstein poems already, in fact, as he’s a quotable and popular poet.)
I’ve speculated that if you encoded IPA or rhyming dictionaries, for full finetuning that might be enough to instill phonetics & rhyme knowledge. I’m unsure about cheaper lightweight finetuning techniques: rhyming isn’t a problem where GPT-3 ‘knows’ it already and just needs to ‘locate’ the task, where tweaking a few parameters might switch emphasis appropriately - it’s an absence of knowledge about tens of thousands of words’ phonetics.
3 Likes
Laen
7
Very cool! Where did you get the training corpus?
dean
8
2 Likes
Hi, @dean
I read this paper the other day and thought of your work here. The authors use a BERT classifier to determine which model generated content closest to the authors. If you wanted to continue your awesome research here, you could take this idea and perhaps:
- Train a classifier on SS"s work (=0) and the generated work (=1) and then generate more poems and then try to classify them. The closer the classification average is to 0, the better the generator.
- Moreover, you could take this this classification to guide the generator towards generating poems that classify closer to 0.
[Training GPT-2 to represent two Romantic-era authors - challenges, evaluations and pitfalls (kent.ac.uk)]
https://www.cs.kent.ac.uk/people/staff/mg483/documents/piotr22Byron_and_Shelly.pdf
My apologies if you’ve already read this paper or already had this idea. I found it helpful to me and thought you might enjoy it too.