How to think about a DPO dataset's "non-preferred output"?

askpro December 22, 2024, 4:31pm 1

For best training results:

Should “non-preferred output” be:

the “the undesired output that’s most likely to be output by the model”

or something else?

Topic		Replies	Views
Issues with DPO fine tuning API	1	254	February 15, 2025
The training data format of DPO finetuning for gpt-4o-2024-08-06 API gpt-4	1	921	December 23, 2024
Generating unwanted answers in Fine-tuning API fine-tuning , fine-tuning-problems	2	64	November 6, 2024
Fine tuning using negative examples? API fine-tuning	5	4494	December 24, 2023
Using fine-tuning for operational report generation API	0	453	April 15, 2023