Fine-Tuning with Reinforcement Learning from Human Feedback

bengadisoufiane · March 31, 2023, 8:50am

Is it possible to Fine-Tuning OpenAI model with Reinforcement Learning from Human Feedback ? If yes, how I can do that ?

josephUL · April 13, 2023, 2:07am

Definitely something I would like to know about as well. From what I can see currently, we’re obliged to re-train our custom model with the added prompt-completions (the full JSONL prompt-completions).

_j · September 6, 2023, 9:39am

Yes, ultimately that’s all you have available to you with current fine-tune models, the inputs/outputs you train it on, and the evaluation of the final product. And start again if you don’t like it.

Held-out validation datasets, weights and biases, training continuation, basically anything to see how well the model performs don’t seem to be available. OpenAI: “the best solution to poor results is to turn off the results.”

What can you do with an OpenAI model? Gather high temperature varied results of your tuning for user inputs (where 1.0 is now high), send them off to be human-evaluated, pick the best (and send ‘all bad’ off for human answering), retrain on more of the best.

Here’s an OpenAI article on RLHF LLM model alignment, just the basic overview and a touch of what the training inputs can do. And you can immediately see what you can’t do. Aligning language models to follow instructions

Zante · March 8, 2024, 10:08pm

I had considered doing something akin to what you described: using 100 sets of data to fine tune for instance, then getting it to generate a 101st, then fixing the output of that, then reusing that as the 101st training sample, however I considered that 101st training data set will be 90% ai generated. Will that result in data contamination and thus performance loss?

That’s my only worry, I could understand how it wouldn’t because you’re just improving the ai’s answer, not giving it new pre training data, but I’m wondering where the line is there?

Topic		Replies	Views
RLHF after Fine-Tuning Davinci? API	7	2065	February 21, 2024
Fine tuining GPT-3.5 while incorporating human feedback API fine-tuning , fine-tuning-problems , fine-tune	6	1589	May 8, 2024
Fine tuning the model for our specific use case? API	4	1015	December 27, 2023
Are fine-tuned models a good way to give GPT a specific tone of voice? API api	5	4073	July 20, 2023
ChatGPT fine-tuning as a service API	17	13857	December 13, 2023

Fine-Tuning with Reinforcement Learning from Human Feedback

Related topics