Is this Finetuning approach right?

I’m using gpt-4o + RAG to generate documents as they’re made in my company. It’s working well but I need it to achieve better results, that’s why I’m finetuning it and these are the kind of entries I’m writing in my .jsonl:

When model replied correctly with a well made document:

{“role”: “user”, “content”: “Make me a document within this info”},{“role”: “assistant”, “content”: “Right document”}

When my model fails:

{"role": "user", "content": "Make me a document within this info ..."},{"role": "assistant", "content": "Wrong document"},{"role": "user", "content": "You failed in the following points: .... Make it again"},{"role": "assistant", "content": "Right document"}

Is this how I’m supposed to do it?

Hi, too little background to say something meaningful. Out of the box, it looks like the task is too complex to handle in one run. A better design of the workflow would probably help, but as I said, too little info (input size, format, type of doc, format, length, complexity, etc) to help.

3 Likes

Sorry, I’m actually looking for a more generic solution. Regardless of the task, I just want to confirm if this is the correct way to perform a fine-tuning job, or if the ‘I reply when it’s wrong and let it be when it’s okay’ approach is flawed :slight_smile: Thank you for replying!

1 Like

As a more general approach, all of those tools are good: RAG, fine-tuning, assistants, coding…

The question is are they adapted to your specific goal? And here the goal is not clear. Because from the little snippets I see, for me, they don’t even fit into one step, so fine-tuning here would basically break your application. But then without the additional information about what are you trying to achieve it’s like writing on the water with a stick.

2 Likes
  1. Step/operation one: generate a great document.
  2. Step/operation two: grab the evaluation criteria from settings defining what is good vs what is wrong
  3. Step/operation three: evaluate the document generated
  4. Step/operation four: read response and either reply with “wrong” or pass the doc further down the flow.

#1 and #3 work better when fine-tuning is involved (in #1 RAG+ code in multi steps is also good to help the fine-tunes models).

#2 - code + RAG (optional)
#4 - code