Fine-Tuning with Reinforcement Learning from Human Feedback

Is it possible to Fine-Tuning OpenAI model with Reinforcement Learning from Human Feedback ? If yes, how I can do that ?


Definitely something I would like to know about as well. From what I can see currently, we’re obliged to re-train our custom model with the added prompt-completions (the full JSONL prompt-completions).

1 Like

Yes, ultimately that’s all you have available to you with current fine-tune models, the inputs/outputs you train it on, and the evaluation of the final product. And start again if you don’t like it.

Held-out validation datasets, weights and biases, training continuation, basically anything to see how well the model performs don’t seem to be available. OpenAI: “the best solution to poor results is to turn off the results.”

What can you do with an OpenAI model? Gather high temperature varied results of your tuning for user inputs (where 1.0 is now high), send them off to be human-evaluated, pick the best (and send ‘all bad’ off for human answering), retrain on more of the best.

Here’s an OpenAI article on RLHF LLM model alignment, just the basic overview and a touch of what the training inputs can do. And you can immediately see what you can’t do. Aligning language models to follow instructions

I had considered doing something akin to what you described: using 100 sets of data to fine tune for instance, then getting it to generate a 101st, then fixing the output of that, then reusing that as the 101st training sample, however I considered that 101st training data set will be 90% ai generated. Will that result in data contamination and thus performance loss?

That’s my only worry, I could understand how it wouldn’t because you’re just improving the ai’s answer, not giving it new pre training data, but I’m wondering where the line is there?