How to perform RLHF on openai model

OpenAI provide API for finetuning models , how can I do RLHF on top of the finetune model again ?

  1. train reward model
  2. use RF to tune the finetuning model again based on the reward model .

Is this possible ?

5 Likes