Yes! It is possible to do supervised fine-tuning and then run preference fine-tuning. We have seen good results from running SFT first and then DPO after.
Yes! It is possible to do supervised fine-tuning and then run preference fine-tuning. We have seen good results from running SFT first and then DPO after.