OpenAI provide API for finetuning models , how can I do RLHF on top of the finetune model again ?
- train reward model
- use RF to tune the finetuning model again based on the reward model .
Is this possible ?
OpenAI provide API for finetuning models , how can I do RLHF on top of the finetune model again ?
Is this possible ?