Hi, has anyone implemented its own RLFH process? Wondering whether to
- build a new model with the annotated human verified answers AND
- ask ChatGPT to generate multiple responses (with explanations) AND
- pick the one that gets the better score from the RLFH model
or whether I should just build an embedding DB with the annotated data and enrich the prompt.
Anyone having tried this?
Thanks
1 Like
sps
2
2 Likes
Just FYI, it’s RLHF.
Using the correct acronym will help ensure you get better results.
2 Likes
Oops - should use ChatGPT for typos 