Fine-tune the GPT-4.1 family using direct preference optimization

You can now fine-tune the GPT-4.1 family using direct preference optimization.

https://x.com/openaidevs/status/1932858051876565475?s=46

https://platform.openai.com/docs/guides/direct-preference-optimization

5 Likes

That’s a cool one, immediately I see a lot of applications. Thanks.

1 Like

I saw the news as soon as it came out and immediately got to work. With even just 30 examples and default hyperparameters, I’m already seeing (small) improvements to my SFT model. Specifically, I was able to get it to present less of a negative attitude and avoid instances of runaway gibberish. I think DPO is a great way to punch out the kinks you get from a SFT run.