I have an archive of thousands of my own articles that are common. It’s the same industry every time, same broad topic, same format/approach to each, similar length… a journalistic article, pivoted off an interview, to present the speaker’s insights, interspersed with my analysis and context.
Going forward, I am keen to investigate using AI to do this - I guess, by fine-tuning text-davinci-003 or the later ChatGPT API.
How viable is this, and what do I need to think about?
I see that fine-tuning is actually a process of providing a volume of prompt-completion pairs. I assume the completions I provide should be my finished articles (?). But what about the prompts?
Transcripts of the initial interviews are available in very many cases (though not necessarily for all of the thousands of articles, and not easily matchable as they are not stored together). Should these be included somehow?
Should the prompts also include some kind of instructions? Maybe the same in every case, beside the transcript?
Lately, I have been trying to prompt ChatGPT to do this by issuing instructions that I believe describe what I do manually. But the results have been mixed, and the prompts have been long. I assume tuning would give it something more specific to mimic - me. Is that accurate? Should I still give the fine-tuning, or the fine-tuned model, the same kinds of prompts, or will the actual source stories by sufficient?
My hope would be that the machine could write in a way that adheres to things like:
Presentation of interviewee quotes
Pattern of cross-head use
It should always preserve speaker quotes, when used as direct quotes, in quoted text.
When the speaker is quoted outside of a direct quote, the attribution should include novel paraphrasing.
Practically, I have an interim ambition to convert all these articles from their current format into Markdown files with YAML metadata. I know the fine-tuning has a particular requirement to be in JSON format. Hopefully, I could find a way to convert from Markdown to the required format.
I don’t have technical answers but I have a similar although more limited ambition to yours—to have ChatGPT write in a style similar to my own.
I have not persevered much with us. But my approach has been to ask it to find writers with characteristics that match how I feel I write and then to ask it to write in their style, believing that it has some training in the style if you could find them in the first place.
It’s not ideal l. More like a poor man’s version of what you’re trying to achieve.
eg. “The following text is a transcript of an interview. Write a 650-word article, the focus should be communicating the views and insights of the interviewee. 66% of the article should be direct quotes. In all other material, remain objective, do not positively endorse the interviewee’s viewpoints, do not use a “conclusion”. Add a smart headline and use Markdown ## for sub-headings…”
Is fine-tuning subject to max-token limits? ie Feeding an interview transcript as a prompt, that is a lot longer than the small snippets I see most people using for training. And the resulting article at 650+ words is longer than your standard custom entity classifier, too.
Keen to learn more about how to take this forward.
… Hmm, I read on the forum that text-davinci-003 can not be additionally fine-tuned, as it is already a davinci base model trained for handling instructions. Is that right?
If so, does this suggest I can ONLY use davinci base, and that I would ONLY prompt with the source transcript, never with clarifying writing instructions?
Additionally, if davinci can only handle 2,049 tokens combined across prompt and completion, that may mean I’m up against it for fine-tuning something to take a transcription (prompt) and show a completion (article), right?
I have a large corpus of my own writing, and I’d like to be able to generate an essay from a list of bullet points, in markdown, in my own voice.
I think my first attempt will be to:
take a few text files,
parse them into 500-token chunks with 30% overlap,
iterate over chunks, and use a model to generate synthetic data: the example input that could have been the prompt that generated the chunk.
parse the result:chunk pair into a CSV file.
Use the CLI to parse this into JSONL.
Use this to fine tune a model in my voice.
Try generating a 1500-word essay in my voice.
What I’m not sure of, is the relative weight of the tuning vs. the original model.
Specifically, if the tuning examples are all 500-token chunks, is that going to interfere with the desired output length, or will it still generate a 1500-word essay, just with increased probability of selecting words that I tend to use?
We’ll see, but I’m curious to hear if you had any further exploration with this.
You can try taking your corpus, and creating neutralized versions of it.
For example, here is a neutralized version of Mike Myers “Sprockets” character using GPT-4.
“Your presence intimidates me to the point of humiliation. Would you care to strike me?”
“Your presence makes me feel uncomfortable. Would you mind adjusting your approach?”
So you take your corpus, feed in your “Styled Input” and have GPT-X create the “Neutral Output”.
The system prompt I used in GPT-4 was simply “Neutralize the following text. Make the text have a neutral voice.” But I spent 10 seconds on it, you should try improving the prompt. And/or use other (cheaper) models if you want.
Then you reverse this to create your fine-tune pairs. So the input is the Neutral version and the desire target output is the Styled version.
OK, so all this fine-tune is create your STYLE. But not your content.
So you will have to feed in content, in close to the neutral version, run it through the fine-tune, and now, in theory this content is in your voice.
After you generate the fine-tune, you just run the raw text in without a prompt. Thinking maybe davinci-002 would be a good choice to fine-tune, since you aren’t creating a chatbot. So you run the text in, it creates text out, in your style.
In theory, yes, there should be a one-to-one relation between input and output, because that is what the expected relation is in your training data. Unless you are extremely long winded or terse. But that will be obvious when comparing neutralized to styled text pairs.
For training, the input is Neutral, and the output is Styled.
@N2U’s post at BASE_URL/t/fine-tuning-of-my-personal-blog/429451/4 (can’t include links, probably because I just joined) seems to also validate the importance of the example output’s length, and reply above validates the importance of the example output’s content.
Ok, so the tuning data basically says “When you see input like this, make output more like this.”
So if the intention is to use the tuned model as a style mapper or “voice converter” on existing text, you’d want neutral_blog_post:styled_blog_post pairs. This is the approach I think you’re describing. You could generate an article with ChatGPT, etc. and then pass it through the tuned model to get the same article in your voice.
If the intention is to use the tuned model as a blog post generator, you’d need typical_input_prompt:styled_blog_post pairs. Then your input wouldn’t be an existing text, but a prompt with the topic + bullet points, which gets a complete blog post in your voice.
Does that sound right?
Thanks for the info both, really appreciate it. Also, the engagement in this community is insane!
I think it just clicked… fine-tuning is functionally the same as few shot learning but:
with a much larger example set, and
resulting in permanent changes to the weights of a model instance rather than just “greasing the groove” using the prompt itself.
The fine-tuning set just increases the probability of outputs like your example outputs in response to inputs like your example inputs.
So whatever the input is that you expect to use with the tuned model (either existing text to stylize/convert, or prompts like the ones you want to use to generate a blog post in your style), you need that as the input examples in your tuning set.
Yes this is exactly what I am saying. The fine-tune basically acts as a skin, and you could create multiple fine-tunes to create multiple skins. Each fine-tune represents a different personality.
So as long as your input, say an article written by ChatGPT is close the the neutral style your fine-tune was trained on, then the output of the fine-tune would pick up your voice and style. This is the main concept.
But a big pitfall that many make is use a fine-tune to supply knowledge and content. So they train on, for example, Question/Answer pairs. This usually always flops, so avoid this.
If you want to create a bot in your voice, you would use RAG. So retrieve content, push it through the fine-tune to add your voice, then send the styled content out.
This would add personality to the bot, and make it sound less like ChatGPT.
This is mainly for peculiar domains that the model wasn’t trained on. But you can add personality for things it was trained on. For example, in System put “You are a cheesy sales bozo and full of sarcasm”. The model will oblige without needing a fine tune.
But let’s say you want to create this cheesy sales bot anyway as a fine-tune … usually because the latency is much lower with a fine-tune. So you can use the prompted GPT to create the input/output pairs, and then create the fine-tune based on this. Now you have a fast model that is a cheesy salesman