AMA on the 17th of December with OpenAI's API Team: Post Your Questions Here

Yes! It is possible to do supervised fine-tuning and then run preference fine-tuning. We have seen good results from running SFT first and then DPO after.

5 Likes