Yes, it is definitely possible to do synthetic data generation for DPO, an important thing to keep in mind is to make sure your evaluator for generated samples is consistent across data points. Also, it is possible to fine-tune for multi-turn conversations! Just put the final assistant message in the preferred / non-preferred output.