ChatGPT: which Fine-tuning method is used under the hood?

manu3b · April 13, 2024, 4:03pm

After reading this very interesting paper:

I’m wondering what is the method used by OpenAI thru their Fine-tuning API: SFT, RLHF or DPO.
This would influence greatly the way to construct the training datasets and explain the good or bad behaviour of the training process, and so the final result.

_j · April 17, 2024, 3:21am

OpenAI doing retraining and refinement on existing models with supervised learning uses PPO (at last leak).

The method of fine-tuning is not disclosed or published.

Topic		Replies	Views
ChatGPT fine-tuning as a service API	17	12780	December 13, 2023
How will finetuning 3.5 turbo / GPT4 differ from the current base model finetuning? API	1	392	July 20, 2023
What custom GPTs are really doing in API terms API gpt-4 , chatgpt	3	353	March 22, 2024
Causal/autoregressive fine tuning? API	1	510	February 8, 2023
What does fine-tuning do? API fine-tuning	5	1007	February 7, 2024

ChatGPT: which Fine-tuning method is used under the hood?

Related Topics