Using RLHF with Fine tuned models

Is it possible to use RLHF with fine-tuned OpenAI models? We have a fine tuned model that can benefit from it

Some would say your fine-tune is already using your RLHF, since you are explicitly modifying the model with your training data.

You can also fine-tune a fine-tune by specifying it as your model to fine-tune.

What will RLHF bring to your fine-tune? What were you hoping here?

The model output provides design variations to the users and we feel it could really benefit from reward based feedback from end users when subsequently fine tuned i.e. Outputs A, C and D were selected or given a thumbs up by x users and therefore outputs such as A,C and D need to have a higher reward estimate

If you want dynamic outputs based on trending data, I would use RAG and not a fine-tune, to capture these changes in near real-time.

This is because you can’t easily “un-train” a fine-tune.

But you can quickly get your desired outputs using in-context learning that RAG provides.

That’s a good idea. Thanks

Maintaining and continuously updating multiple RAG chains might get tricky for all design categories but I guess worth a shot as it might just lower our training costs too

1 Like