Fine tuning for writing style - lessons and questions

ziqizhang · January 17, 2024, 11:55am

Apologies for the long post ahead, I hope you can bear with me as I am trying to learn

Background: My task is getting GPT to write a marketing email for our customers. This email’s content and writing styles need to be tailored based on:

content (e.g., types of products we want to promote)
customer segment (e.g., frequent buyer, lapsed, etc)
time of the year (e.g., black Friday)

So far, I have used GPT-4-turbo in a RAG + fewshot learning approach by prompting and giving example emails (we have a lot of historical emails that we can call ‘ground truth’) to get decent results. Let’s say this series of prompts and intermediary outputs form the ‘dialogue’. However, one problem I still have is the writing style of the email sometimes appear superficial. For example:

Emails often start like ‘I hope this email finds you well…’ but I want something like ‘Hey X…’, ‘Clock is ticking!’…
Too formal languages like ‘imagine/envision…’ but I want ‘Just think about… do you know…’

To fix these issues I already tried extra prompting such as ‘Look for phrases like X, Y, Z and replace them with something informal…’ but I had little success.

Fine tuning: So my thought therefore is to fine tune my own model that can write emails in the style I want. I did some research on this forum and found discussions like this, this, this, and this. I have tried something that failed and I suspect I did it wrong. So here I am going to explain in more detail.

Initially, I thought I could fine tune a GPT3.5 so it learns the ‘writing style’ but also retains the ability as a general purpose chatbot. But this didn’t work. What I did is similar to this thread, i.e., pairs of previous/next sentences, in the hope that the fine tuned model learns the writing style. My training data are like:

{"messages": [{"role": "system", "content": ""}, {"role": "user", "content": "sentence1 from example emails"}, {"role": "assistant", "content": "sentence2 from example emails"}]}
{"messages": [{"role": "system", "content": ""}, {"role": "user", "content": "sentence2 from example emails"}, {"role": "assistant", "content": "sentence3 from example emails"}]}
...

But it appears that all the model has learned is ‘mapping’ my input text to another text. And it also lost the general chatbot ability. For example, if I say ’

Please write a marketing email for mobile phones’

it does not actually do that but produces another sentence that looks random and makes no sense.

So my Question1 is, fine tuning will cause the model to lose its general chatbot nature and can only do one thing, that is mapping your input to output in the training data. Is this correct?

What next: Building on this, I am rethinking my approach and I have two ideas.

First, following this thread, I think I could fine tune using this setup:

{"messages": [{"role": "system", "content": ""}, {"role": "user", "content": "email with a neutral tone/style"}, {"role": "assistant", "content": "the ground truth email"}]}

, where the ‘user content’ would be the initial email in the neutral tone and the assistant’s output would be the final email in the desired tone. To create the neutral toned email, I could use an idea like the text neutraliser. Then I could approach my task in a ‘hybrid’ manner: RAG+few shot learning on GPT4 turbo using the dialogue to get an output email (email with a neutral tone/style), then use the fine tuned model above to ask it to revise its style to get the final output email.

So my Question2 is, is this a reasonable approach? In the fine tuning data, do I need to add an instruction in the ‘user’ content, say ‘Rewrite the following email to improve its style: [email with a neutral tone/style]’?

Second, this thread says that the fine tuning data can contain multiple messages, not just a single ‘user-assistant’ content pairs. So I think I could try a setup that includes the whole ‘dialogue’ that I use in my RAG+few shot approach, like this:

{“messages”: [the ‘dialogue’]} =
{“messages”: [{“role”: “system”, “content”: “”}, {“role”: “user”, “content”: “prompt 1”}, {“role”: “assistant”, “content”: “output1”},{“role”: “system”, “content”: “”}, {“role”: “user”, “content”: “prompt 2”}, {“role”: “assistant”, “content”: “output2”}, …]}

So my Question3 is, is this a reasonable approach? When using the fine tuned model, what will my input be? Do I prompt it step-by-step (prompt1, wait for answer; prompt2, wait for answer), or do I need to compose the whole dialogue chain then feed it as one input?

Thanks for taking your time to read this! Any comments - either general comments or answers to any of the questions - are highly appreciated!!

jr.2509 · January 17, 2024, 5:43pm

Hi there - Could you please rephrase this setup for my benefit.

{“messages”: [{“role”: “system”, “content”: “”}, {“role”: “user”, “content”: “email with a neutral tone/style”}, {“role”: “assistant”, “content”: “the ground truth email”}]}

Your input here would be the initial email in the neutral tone and the assistant’s output would be the final email in the desired tone? Is this correct?

jr.2509 · January 17, 2024, 5:45pm

Furthermore, I can confirm that it is possible to finetune a model based on a longer dialogue. I have only included two rounds of interactions once as part of an experiment but it worked at the time.

ziqizhang · January 17, 2024, 5:51pm

Yes you are totally right! I have updated my post to make it clearer.

Great to know that you can fine-tune a model using a dialogue! Can I ask, once the model is fine tuned and if you are to use it, do you have to

concatenate your full dialogue as one input, or
engage with the model in a back-and-forth dialogue (in your case two rounds) in the format of the training data.

1 or 2?

Thanks

jr.2509 · January 17, 2024, 5:55pm

I engaged with the model in a back and forth dialogue when I tested it.

To me it sounds like you may also want to give assistants a try, especially if you want a hybrid model of a bot that behaves in line with your instructions (similar to a finetuned model) but at the same time can engage on other aspects. There you could also upload your ground truth examples which the assistant could draw on to compile responses. It’s easy to experiment using the assistant interface on the OpenAI platform.

jr.2509 · January 17, 2024, 5:59pm

On the original point regarding this schema:

{“messages”: [{“role”: “system”, “content”: “”}, {“role”: “user”, “content”: “email with a neutral tone/style”}, {“role”: “assistant”, “content”: “the ground truth email”}]}

Yes, your understanding is correct. You could create training files that follow this pattern and then finetune your model accordingly. It should produce the desired results.

Here too, you might want to initially do some small scale testing to see how you need to maybe tweak the system instruction to get the optimal outcomes. But given you can finetune even with a very low number of examples, that should not be too burdensome or costly.

Topic		Replies	Views
Training gpt-3.5 to autocomplete for a niche domain and a specific writing style Community chatgpt	13	1775	July 25, 2024
Email generation use case - prompting or fine tuning? API	3	1698	July 13, 2023
Finetuning for shortening prompts Documentation fine-tuning	10	3791	December 24, 2023
Fine-tuning for more natural responses API fine-tuning	4	358	January 13, 2025
Are fine-tuned models a good way to give GPT a specific tone of voice? API api	5	3889	July 20, 2023

Fine tuning for writing style - lessons and questions

Related topics