Fine tuning 3.5 best option for carefully editing raw transcript content for better readability?

Hi, hope somebody can offer some advice. Struggling to get a ton of transcripts edited/cleaned nicely, Gpt4 worked eventually with script, careful prompt and chunking but then was way too expensive for how many transcripts i got so have to use 3.5 or other method.

3.5 turbo simply wont work well enough with prompts no matter what I try, it only works with a big example in UI, but that indicates a good candidate use case for a simple fine tuning right?

Should I use fine-tuning gpt 3.5 turbo to get it to edit transcripts in a very specific way then?

Do I just add a bunch of sample single sentences extracted from transcripts along with the result of how it should edit and output a range of different edits and it will get the idea? Or would you submit larger chunks of transcript samples like 500 words so it can see a few entire transcript sample input vs outputs to know what I want. Or a lot of both?

Edit: Important question: How do I add an input of something that I simply want removed completely, to be edited out, like fluff i want removed from the transcript?

Or is there another easier solution? Thanks!

Hi there and welcome to the forum!

Two cents from my end on this. Definitely worth giving a finetuned gpt 3.5 turbo a try for this. The way I would approach it is to create pairs of unedited content and edited content in the desired style. In your system prompt I would include clear instruction so the model understands what you are trying to achieve. In there you can also specify under which circumstances you want certain pieces of text removed. I would likely use slightly larger paragraph samples (anywhere between 200-500 words).

As for the system prompt itself, one additional point. I have recently adopted a strategy where I specify the desired characteristics of output in the form of a list of principles, i.e. I would write something along the lines of “In executing task XYZ, please adopt the following principles: (1) xxx, (2), xxx, (3) xxx.” This can include aspects like writing style, level of detail etc. I have found this to be working well. Maybe this is something you might find useful in your case as well.

Maybe run a few tests by creating a couple of different finetuned models using different variations of the training dataset and system prompt to see what works best. It’s often about trial and error and you can even with a fairly small training set get an initial idea how well your finetuned model will work and then subsequently create your final finetuned model based on a larger training set.

Great thanks a lot! I will try this, appreciated.