Fine tuning for social media text generation

jonathan.sabbah · September 27, 2022, 3:12pm

Hello!

I’m building a LinkedIn post generator, the goal is to generate a full post almost ready to publish from a prompt containing 1-2 sentences (e.g. « I’m happy to announce that my startup has raised 1M Series A »). The generated posts need to be fun & exciting Also it’s only French posts.

I tried many strategies :

Attempt 0 : Few shot (text-davinci-002) ⇒ result zero jibberish but the posts are not fun nor exciting and they are similar to the prompt and/or the few shot examples
Attempt 1 : Fine tune with 40 examples (Davinci, epoch = 4) ⇒ result 90% gibberish
Attempt 2 : Fine tune with 500 examples (Davinci, epoch = 4) ⇒ a lot better in term of fun but still around 50% of gibberish
Attempt 3 : Fine tune with 4000 examples (Davinci, epoch = 1) ⇒ 90% of gibberish, maybe because of n_epoch=1
Attempt 4 : Fine tune with 4000 examples (Davinci, epoch = 4) ⇒ 60% of gibberish which makes no sense

Am I doing something wrong here? By doubling 3 times the dataset, my results should have been largely improved, right?

Should I increase the n_epoch as the dataset is larger? (Even if the official OpenAI docs says that « 1-2 epochs tends to work better for these use cases »)

My fear is that I’ve reached a threshold and that quality would not improve even with many more examples

josephfrusetta · September 28, 2022, 4:56pm

In the past, Boris from OpenAI has stated, “Increasing the dataset size will make a much bigger difference, than tinkering with the hyperparameters. My advice would be to leave the epochs at 4, unless you have a very small dataset.”

From my personal experience, larger datasets don’t always correlate towards better results. If you continued A/B testing larger datasets, I believe you’ll start to notice diminishing returns. Personally, I would stick with a few thousand examples and then continue to A/B test different datasets of the same size.

lmccallum · September 29, 2022, 2:20am

I think giving detailed instructions to text-davici-002 will work better than fine-tuning for this use case. I suspect that if it’s not learning this simple task from 4000 examples, it’s not going to learn. Davinci was trained on the whole internet, so it already knows very well what a LinkedIn post is. Have you tried including something in the prompt like "An interin at a high-growth, funded tech start-up is tasked with writing LinkedIn posts promoting the company’s achievements. The posts must be fun, engagng and approximately 500 words long. Each post should be unique. Please draft 10 possible LinkedIn posts that would fulfill the intern’s assignment.
1.
Let me know if this works. You might need to be more precise about the topic the 10 posts will cover (e.g. obtaining funding), and then repeat the prompt with a new topic (e.g. new hires).

jonathan.sabbah · September 29, 2022, 7:55am

Unfortunately, I’ve tried such things and every time GPT-3 ignores the “500 words long” instruction, each post are around 10-15 words long

Topic		Replies	Views
First Fine Tune was kind of disappointing? API	1	526	February 6, 2024
Got awful results after fine-tuning API	11	3197	December 1, 2022
Fine Tuned Chatbot forgets how to output summary of conversation API	9	1837	December 18, 2023
Fine-Tuning a Model for Specific Length Output Prompting fine-tuning	2	354	August 24, 2024
Fine tuning very very poor results API fine-tuning , api	16	2887	July 11, 2023

Fine tuning for social media text generation

Related topics