Hi, has anyone implemented its own RLFH process? Wondering whether to
- build a new model with the annotated human verified answers AND
- ask ChatGPT to generate multiple responses (with explanations) AND
- pick the one that gets the better score from the RLFH model
or whether I should just build an embedding DB with the annotated data and enrich the prompt.
Anyone having tried this?