AMA on the 17th of December with OpenAI's API Team: Post Your Questions Here

We don’t have a current roadmap for response pre-filling, but we will keep this in mind! For the DPO datasets, this can be obtained through human annotation, or some kind of A/B testing flow. For synthetic data generation, you can also explore some kind of rejection sampling with a model and an evaluation to help generate preferred and non-preferred outputs from the same prompt.

3 Likes