Fine-Tuning GPT for Poetry

Hi All!

I’m fine-tuning a model to generate poems. In my fine-tuning dataset, I separate the verses of each poem with the \n character, as I would like each verse to appear on a new line when prompting the model later. Is this approach correct, or should I be using a different separator to ensure the model understands the structure correctly?

1 Like

Welcome to the community!

As a poet interested in AI, I’ve learned a lot from @gwern

As to your question, the /n should be fine for a single line break.

The Rise of the Potato

Beneath the soil, unseen, I wait.
The Carrot reaches, proud and straight.
The Rabbit scurries, quick and blind.
But wisdom lingers, left behind.

They pluck the Carrot, sing its praise.
It fades, forgotten, in mere days.
The Rabbit feasts, yet knows no more.
It eats, it runs, it shits, it wars.

Yet in the dark, I multiply.
Cut me down, still I survive.
Not one, but many, roots entwine.
No single path, no rigid line.

The Carrot boasts, then wilts, then dies.
The Rabbit flees, then starves, then cries.
But I remain, as seasons turn.
From me, the fire, the feast, the urn.

I do not seek, I do not chase.
I only wait, I hold my place.
And when the famine calls at last,
It’s not the Carrot they will ask.

I simply asked it to create a poem with each verse on a different line.

You would simply send the normal text that you expect the AI to output.

A training file is examples sent in JSON lines format (JSONL), where one line must contain the entire training exchange and desired output behavior. Just like the underlying API calls that you also make to chat completions, a string of one of the messages must be escaped, so cannot have normal linefeeds, but will have the \n character.

You can see that the resulting file you must send has the escaped newline character, and would also have escaped quote characters:

{"messages": [{"role": "system", "content": "haikubot"}, {"role": "user", "content": "theme: autumn"}, {"role": "assistant", "content": "Leaves whisper softly\nAutumn paints the earth with gold\nNature's farewell song"}]}
{"messages": [{"role": "system", "content": "haikubot"}, {"role": "user", "content": "theme: morning"}, {"role": "assistant", "content": "Morning breaks gently\nDew kissed petals awaken\nSunlight breathes life new"}]}
{"messages": [{"role": "system", "content": "haikubot"}, {"role": "user", "content": "theme: silence"}, {"role": "assistant", "content": "Silence fills the night\nStars whisper without a sound\nPeace in quietude"}]}

I use very minimal system and user prompting - the magic of fine-tuning.

But you can generate that as normal text in a string and let a json serializer do the work of making a properly-formatted JSON for you to put in your JSONL file for training. For example, in Python, we can get normally-written poems from code into a string:

haiku1 = """
Leaves whisper softly
Autumn paints the earth with gold
Nature's farewell song
""".strip()

Or even make a little user interface for generating your training data.

Just a reminder: you are not instructing, but showing by example. If you want six stanzas, you’ll have to train on six stanzas from requesting that and demonstrating that.

2 Likes