RLHF after Fine-Tuning Davinci?

markhennings · July 11, 2023, 2:11pm

How many examples are in your training set for the model? At what scale are you trying to do RLHF?

One simplified way to improve your dataset with human feedback is to prompt your model to generate multiple completions (let’s say 3) at a temperature greater than 0.

Then select the best one to save to your dataset.

Topic		Replies	Views
How to improve a fine-tune classifier? Prompting	10	1414	August 15, 2022
Fine Tuned Chatbot forgets how to output summary of conversation API	9	1873	December 18, 2023
Fine tuining GPT-3.5 while incorporating human feedback API fine-tuning , fine-tuning-problems , fine-tune	6	1572	May 8, 2024
Fine tuning the model for our specific use case? API	4	996	December 27, 2023
How to correctly fine tune my own model? API	3	2663	January 21, 2023

RLHF after Fine-Tuning Davinci?

Related topics