AMA on the 17th of December with OpenAI's API Team: Post Your Questions Here

  • will OpenAI also support the “prefill” for greater output control that Claude already has? It seems quite easy to implement, or is there a reason OpenAI doesnot support this?

  • for the direct preference optimization (DPO), is it better to sample the preferred response and the non preferred response from the model that is finetuned or does it not matter where they come from. It seems more logical to sample them from the same model, but it can be hard to get 2 responses in a clear different style.