What is the inner fine-tuning strategy used for fine-tuning api, full parameter tuning or LoRA tuning? I’m recent doing experiments using fine-tuning api and this matters a lot for my research.
You can read about PPO, however, OpenAI won’t comment on their technology officially.
1 Like