Are there any plans to use SF fine-tuned models as a grader in reinforcement learning?

Hi OpenAI team and @Karan_Sharma ,

I am very excited to see that RL came to fine-tuning models for the o-4 mini series. One of the things that I would like is to have one of my SF fine-tuned models as graders for the RL pipeline. Why? My SF fine-tuned model does a good job in in-distribution, but it fails to capture out-of-distribution cases. I think if I use this as a grader for the RL model, RL might generalize the stuff in the data and make a much more robust model.

Maybe I am wrong? Does my logic make sense? If so, are there any plans to add support for other fine-tuned models to be used as graders? I think there is a huge potential if we can use it.

1 Like