Using RLHF with Fine tuned models

ashutosh.bagga · January 17, 2024, 3:29pm

Is it possible to use RLHF with fine-tuned OpenAI models? We have a fine tuned model that can benefit from it

curt.kennedy · January 17, 2024, 3:51pm

Some would say your fine-tune is already using your RLHF, since you are explicitly modifying the model with your training data.

You can also fine-tune a fine-tune by specifying it as your model to fine-tune.

What will RLHF bring to your fine-tune? What were you hoping here?

ashutosh.bagga · January 17, 2024, 4:13pm

The model output provides design variations to the users and we feel it could really benefit from reward based feedback from end users when subsequently fine tuned i.e. Outputs A, C and D were selected or given a thumbs up by x users and therefore outputs such as A,C and D need to have a higher reward estimate

curt.kennedy · January 17, 2024, 4:21pm

If you want dynamic outputs based on trending data, I would use RAG and not a fine-tune, to capture these changes in near real-time.

This is because you can’t easily “un-train” a fine-tune.

But you can quickly get your desired outputs using in-context learning that RAG provides.

ashutosh.bagga · January 17, 2024, 4:40pm

That’s a good idea. Thanks

Maintaining and continuously updating multiple RAG chains might get tricky for all design categories but I guess worth a shot as it might just lower our training costs too

Topic		Replies	Views
RLHF after Fine-Tuning Davinci? API	7	1288	February 21, 2024
3.5t fine tuning with RAG based examples in system prompt? API chatgpt	0	751	October 19, 2023
Fine tuining GPT-3.5 while incorporating human feedback API fine-tuning , fine-tuning-problems , fine-tune	5	272	April 23, 2024
Fine-tuning vs Context-Injection (RAG) Prompting gpt-4 , gpt-35-turbo , chatgpt	5	6761	December 11, 2023
Fine-Tuning to Avoid Scary Responses (Negative Reward) API api	9	1170	August 25, 2023

Using RLHF with Fine tuned models

Related Topics