Fine-Tuning with Reinforcement Learning from Human Feedback

Is it possible to Fine-Tuning OpenAI model with Reinforcement Learning from Human Feedback ? If yes, how I can do that ?


Definitely something I would like to know about as well. From what I can see currently, we’re obliged to re-train our custom model with the added prompt-completions (the full JSONL prompt-completions).

1 Like

Yes, ultimately that’s all you have available to you with current fine-tune models, the inputs/outputs you train it on, and the evaluation of the final product. And start again if you don’t like it.

Held-out validation datasets, weights and biases, training continuation, basically anything to see how well the model performs don’t seem to be available. OpenAI: “the best solution to poor results is to turn off the results.”

What can you do with an OpenAI model? Gather high temperature varied results of your tuning for user inputs (where 1.0 is now high), send them off to be human-evaluated, pick the best (and send ‘all bad’ off for human answering), retrain on more of the best.

Here’s an OpenAI article on RLHF LLM model alignment, just the basic overview and a touch of what the training inputs can do. And you can immediately see what you can’t do. Aligning language models to follow instructions