Fine tuining GPT-3.5 while incorporating human feedback

lucelsby · April 4, 2024, 6:52am

I am developing a fine tuned model for a client (a lead gen agency) to help them write emails for their clients by learning from the unique style, tone & structure of their existing content.

I have fine tuned a model and the outputs are probably 60% there but need some improvements. I have built an application that allows them to rate the outputs, provide feedback, and save the final versions they use.

How can I incorporate this into a new fine-tuned process to help reinforce the feedback they provide and the differences between the initial AI output and their final version?

Practically, is it possible to fine tune a model not only using:
‘system’, ‘user’, ‘assistant’ messages
but ‘system’, ‘user’, ‘feedback’, final output’ type approach?

Would this work or do the fine tuned models not learn in this way?

Any help would be much appreciated.

jr.2509 · April 4, 2024, 7:11am

Hi - your suggested approach is not possible.
You can continue to fine-tune an already fine-tuned model. However, the data set needs to be consistent with the standard conventions and therefore must follow the system, user and assistant messages approach.

vb · April 4, 2024, 7:27am

Hi!
The feedback you are collecting represents additional, potential training items.
If these new samples are very close to the initial data used for fine-tuning then it makes sense to continue training the existing model.

If however you find that there are differences you can consider to start fine-tuning from scratch with a higher quality dataset.

You can also try sharing some examples and the settings used for fine-tuning.

udm17 · April 4, 2024, 8:28am

Heyy.

As mentioned above by @jr.2509, the conventions for the input into the fine-tuning setup are fixed, so altering them is not possible. However, there is another approach you could apply.

Just an additional step to incorporate the feedback into the fine-tuned dataset, you could seperate out the generations for which feedback was given, negative specifially, and run another script which would essentially generate a “golden sample” by rewriting the email but with the feedback incorporated into it.

This output you could then use as a sample in your training dataset

lucelsby · April 7, 2024, 9:32am

Thanks so much for your responses!

What I think you’re suggesting is that the only way to fine tune a model is to provide positive examples for it to learn from, and the ability to reinforce behiviours with negative examples, or descriptive feedback isn’t supported in the current fixed setup.

So, to incorporate the negative feedback, we’d have to turn this into a positive example for the model to learn from rather than explaining why it is bad.

wonjae · April 23, 2024, 4:18am

@lucelsby

This is the closest thing to what might solve your problem.

ds11 · May 8, 2024, 11:17pm

Hello, how do you managed to fine-tune an already fine-tuned model ?

I keep getting the error Model ft:gpt-3.5-turbo-0125:{org}:{mycustomsuffix}:9MiTvkka is not available for fine-tuning or does not exist.

Topic		Replies	Views
RLHF after Fine-Tuning Davinci? API	7	1921	February 21, 2024
How can I re-fine tunning a GPT-3 fine tunned model? API	3	928	December 26, 2022
Fine tune model with some incorrect training data in large data set API fine-tuning , api , fine-tuning-problems	4	839	October 11, 2023
GPT-3.5 Turbo fine-tuning now available (and new GPT3 models) API announcement , fine-tuning , api	18	16072	December 15, 2023
Finetuning with New Data on Existing Finetuned Model API	6	2315	February 28, 2024

Fine tuining GPT-3.5 while incorporating human feedback

Related topics