RLHF after Fine-Tuning Davinci?

How many examples are in your training set for the model? At what scale are you trying to do RLHF?

One simplified way to improve your dataset with human feedback is to prompt your model to generate multiple completions (let’s say 3) at a temperature greater than 0.

Then select the best one to save to your dataset.