Is documentation for the upcoming reinforcement fine-tuning available?

Reinforcement fine-tuning is currently only available to selected alpha testers and will be made available in early 2025. It’s a complex, compute-intensive, potentially dangerous thing, I understand that.

But is there an argument against already making documentation available on how it will work when it’s released? I am talking about things like how to define graders and what evaluators are available. This would enable developers to already prepare their datasets in anticipation.

So… what happened to this? I was really excited for the ‘reinforcement fine-tuning’ feature and it seemed to just get walled off to mystery alpha testers?

1 Like

I had a suspicion that I need to cross-check with the original announcement video.

Could it be that the evaluators for RFT are the same als the ones you can define in the evaluations API?

To answer this question that was probably self-answered. Your documentation.

https://platform.openai.com/docs/guides/fine-tuning#preference

There is but one model available.

Fine-tuning has been released per-model before to those who actively have used fine-tunes before.

Hi @_j,

the fine tuning method you are showing is Direct Preference Optimization (DPO), also known as Preference Fine Tuning.

It is not Reinforcement Fine Tuning, which is what my question was about.

I already have access to DPO (AFAIK everyone has)

There is no documentation available for Reinforcement Fine Tuning yet. (At least not for me, as the documentation can change when you are logged in)