I’m working on implementing Reinforcement Learning from Human Feedback (RLHF) with OpenAI APIs and wanted to confirm whether my feedback structure is compatible for this use case. My goal is to collect user feedback, provide model responses, and use a reward system for fine-tuning in an RLHF-like setup.
{
"messages": [
{
"role": "system",
"content": "You are a virtual assistant that responds to customer email requests with an action and a polite message."
},
{
"role": "user",
"content": {
"feedback": {
"prompt": "",
"given_response": {
"subject": "",
"message": "",
"action": ""
},
"correct_response": {
"subject": "",
"message": "",
"action": ""
},
"error_message": "Model Selected Wrong Subject",
"reward": 1
}
}
}
]
}
My Key Questions:
- Can this structure be used to simulate RLHF, in the context of providing feedback on the response?
- What is the best reward range to use (e.g., [-1, 1] vs. [-5, 5]) for RLHF fine-tuning purposes?
- Does OpenAI’s API currently support RLHF directly, or would this approach only work for collecting data for future fine-tuning?
- Should I use RLHF or RAG for this purpose ?