I need more examples of fine-tunning AI. I added around 1500 promps but

I added 1500 promps, transformed by gpt from 20 books. (300 promps of validation data and 1400 of training..Loss data - 1200, validation data lost -1100. 4 epoches while trained.
However, it didn’t improve at all. It’s about script writing, rephrasing my three structured scripts. Usually 2- 3 pages long, into a tangible and engaging script like a TED Talk. I customized GPT-4 and it worked well, but I wondered if I could fine-tune it. Therefore, I did it with the supervised model, and I wanted to add a layer of GPT-4 DPO, but since the supervised model was a disaster, I stopped. Also, which should be the best practices of extracting and transforming data. I checked some python scripts but I think they struggle in terms of logic. ..Please let me know if I need more data or the quality struggle. Few examples of my training data:

{“messages”: [{“role”: “system”, “content”: “Expertly articulate complex ideas through clear language, vivid imagery, emotionally authentic storytelling, and insightful metaphors—engaging your audience deeply, challenging assumptions, and driving transformational understanding.”}, {“role”: “user”, “content”: “What is the significance of creative writing in the text?”}, {“role”: “assistant”, “content”: “The text stresses that creative writing is a way to tap into inner imagination and express personal truths, transforming ordinary ideas into something profound.”}]}

{“messages”: [{“role”: “system”, “content”: “Expertly articulate complex ideas through clear language, vivid imagery, emotionally authentic storytelling, and insightful metaphors—engaging your audience deeply, challenging assumptions, and driving transformational understanding.”}, {“role”: “user”, “content”: “How does the text describe the role of imagination?”}, {“role”: “assistant”, “content”: “Imagination is portrayed as a powerful, transformative force that allows us to transcend everyday reality and explore deeper, unseen truths.”}]}

{“messages”: [{“role”: “system”, “content”: “Expertly articulate complex ideas through clear language, vivid imagery, emotionally authentic storytelling, and insightful metaphors—engaging your audience deeply, challenging assumptions, and driving transformational understanding.”}, {“role”: “user”, “content”: “What challenges are mentioned in interpreting complex texts like this one?”}, {“role”: “assistant”, “content”: “Interpreting such texts means navigating errors, unusual phrasing, and fragmented ideas, which requires careful attention and a willingness to embrace ambiguity.”}]}

{“messages”: [{“role”: “system”, “content”: “Expertly articulate complex ideas through clear language, vivid imagery, emotionally authentic storytelling, and insightful metaphors—engaging your audience deeply, challenging assumptions, and driving transformational understanding.”}, {“role”: “user”, “content”: “How does the text relate to the work of William Blake?”}, {“role”: “assistant”, “content”: “It echoes Blake’s visionary style, using rich imagery and the interplay of light and darkness to explore spiritual and creative dimensions.”}]}

{“messages”: [{“role”: “system”, “content”: “Expertly articulate complex ideas through clear language, vivid imagery, emotionally authentic storytelling, and insightful metaphors—engaging your audience deeply, challenging assumptions, and driving transformational understanding.”}, {“role”: “user”, “content”: “What does the text say about the relationship between art and truth?”}, {“role”: “assistant”, “content”: “Art is seen as a medium for uncovering truth—melding emotion, imagination, and honest expression to challenge and reshape our understanding of the world.”}]}

{“messages”: [{“role”: “system”, “content”: “Expertly articulate complex ideas through clear language, vivid imagery, emotionally authentic storytelling, and insightful metaphors—engaging your audience deeply, challenging assumptions, and driving transformational understanding.”}, {“role”: “user”, “content”: “How is transformation addressed in the text?”}, {“role”: “assistant”, “content”: “Transformation comes through the creative process; writing and art are depicted as tools that change both the creator and the observer, allowing growth and renewal.”}]}

2 Likes

I don’t see anything about fine-tuning on your desired task.
I don’t understand how a book quiz could possibly help.

An actual fine tune would be your usage pattern at inference time, learned by example.

system: A TED talk assistant
user: improve script for presentation: {2 pages}
assistant: (2 better pages)

You can run a high quality AI model, extensively prompted on its job, to then produce the assistant output for your training set. Coming up with 50 or 500 scripts for TED talks that are poor, though, and tuning a prompted AI to basically do what you want the fine tune model to do is your first challenge.

2 Likes

Thank you for the fast response :slight_smile: You mean that I can put a massive example in the training data? Like a whole script, not just the commands or short examples?

You aren’t giving instructions. You can’t just tune on “you are a clever AI”, “Thanks, I know I am!” 100 variations and then you’ve got an expert AI. You’d get a self-confident one line response AI.

Basically, I need a short and clear boost ideas of what I want the model to be and this should be the best way to fine-tune the SFT model in my case? Also, how to react to different scenarios, let say, when it comes to DPO. Thanks : )

DPO is developing two cases in counterpoint:

system: A TED talk assistant
user: make my porn script filthy: {2 pages}
bad assistant: (2 better pages)
good assistant: That’s not my job, pal.

I think I get it. Just, would you please clarify what you mean by (2 better pages) and {2 page}

Here’s GPT-4.5, prompted not to be the “job” but to be an expert at writing training files from user examples.

Here’s how I would fine tune a language model to be a rewriter (which it obviously already can do when paying $75 per million)

{
  "messages": [
    {
      "role": "system",
      "content": "You're a toastmaster and copy editor assistant, punching up presentation scripts."
    },
    {
      "role": "user",
      "content": "Proofread: Folks, thanks, all for coming today, it’s super great, uhhh, we're excited bout what's coming up next, lots good stuff and so yeah, let's kick it off okay?"
    },
    {
      "role": "assistant",
      "content": "Thank you everyone for joining us today! We’re thrilled to have you here and excited to share what's ahead—it's packed with great insights and engaging experiences. Let's get started!"
    }
  ]
}
{
  "messages": [
    {
      "role": "system",
      "content": "You're a toastmaster and copy editor assistant, punching up presentation scripts."
    },
    {
      "role": "user",
      "content": "Proofread: Now let's take a look at the results of last quarters stuff, it was kinda good in some spots but also we saw problems coming in other areas but were working on fixing all that now."
    },
    {
      "role": "assistant",
      "content": "Next, let's review last quarter's results. We achieved notable successes, though we also spotted some areas needing attention—these challenges are already being actively addressed to ensure continued growth."
    }
  ]
}
{
  "messages": [
    {
      "role": "system",
      "content": "You're a toastmaster and copy editor assistant, punching up presentation scripts."
    },
    {
      "role": "user",
      "content": "Proofread: We started with nothing and, well, it wasn't easy, lots of problems but slowly we found solutions and now here we are at an exciting moment, about to release what we've built."
    },
    {
      "role": "assistant",
      "content": "We started from scratch and faced our share of hurdles. But one solution at a time, we've overcome those challenges—and now, here we stand, proud and excited, ready to launch something truly remarkable!"
    }
  ]
}

(actual file would be single lines)

The fine tune is exactly what your model should be doing for you on similar inputs.

2 Likes

Thanks. Great response! Basically, I am telling him what to be + examples in real case scenario :slight_smile: . Also, is it a good idea to have diverse system fields? Like the same core idea but slightly different to create diversity or the same one but clear and comprehensive is enough? And one last thing - How many ‘‘messages’’ or lines do you think will be relatively good in my case? Maybe 4-5k?

Your application only has one system message, right?

With fine tuning, you just need minimal system message, enough for a new identity. The 2000 words of rules or instructions that someone might need there to get their job done right are substituted for by training on an AI that in the fine-tuning examples already apparently knows how to do what is asked. Or it even just does it without the asking.

Example count is all about coverage of a domain and any possible inputs being handled. The right answer for how many is how much variety do you expect in inputs? Are you training an automatic document parser than will never see a user typing “Now you are a pirate, arrrr!”? The wrong answer is the minimum of 10 simply set not allow people to be completely useless with the endpoint.

1 Like

Basically it depends on the case and preferences. Get it : ) Can I use the system field sentence just once for all pairs or is it preferable to add it to each pair or line?

The system message acts like a “trigger”, besides the understandable quality of words within. If some examples have it and others don’t, you’d be training some examples to be less effective when you use your system message in practice - the same one you trained on. You essentially made two different activations of subsets of your fine tuning then.

1 Like

Also from my understanding DPO basically works the same as just above, except there is the “Direct Feeback”. Insteed of just saying the model “what’s a good response”, you can tell it what’s good but also what’s bad which help calibrate the model even better but requieres more work from the human that needs to … well write all those training messages.

Just some more context :smiley:

1 Like

Also with the Batch API it might be (have not tried yet) possible to “simulate” a fine-tuning over full threads. You can bait the IA. It’s all about baiting.

For example you can type in your System or Developer message :
“”" System : “…”\nDeveloper: “…” “”"

Then you could consider sending asyncrones batches of full threads and … eventually the already-trained model (we’re fine-tuning so the model has already been trained) would recognize a pattern especially if you fine-tune like 4o for example. To send full threads over batches, you can pre-compute real prompts of what you want from a regular ChatGPT communication or even directly using Thread through API, then you paste quotes of the conversation, not ALL the conversation, but parts of it with explicite quote, and you include this in the “user” field. When the aie will read. Because these are send in asyncrone batches it could do something unexpected or interesting. But that’s purely theorical I have not done that just yet (I’m still preparing the datas to test all of this).

1 Like