ChatGPT: which Fine-tuning method is used under the hood?

After reading this very interesting paper:

I’m wondering what is the method used by OpenAI thru their Fine-tuning API: SFT, RLHF or DPO.
This would influence greatly the way to construct the training datasets and explain the good or bad behaviour of the training process, and so the final result.

OpenAI doing retraining and refinement on existing models with supervised learning uses PPO (at last leak).

The method of fine-tuning is not disclosed or published.