Fine-Tuning with Reinforcement Learning from Human Feedback

Is it possible to Fine-Tuning OpenAI model with Reinforcement Learning from Human Feedback ? If yes, how I can do that ?

1 Like

Definitely something I would like to know about as well. From what I can see currently, we’re obliged to re-train the model with the added prompt-completions.