Hi, I wonder what the fine-tuning service really is - is it like parameter efficient tuning (e.g., LoRA) or is it actually prompt tuning?
Hi and welcome to the developer forum!
No one knows exactly how OpenAI’s tuning service works, but I think it’s safe to assume it’s a variation of a LoRA style system, the mathematics kind of needs it to be unless they have something radically different, not sure what you mean by prompt tuning.
The type of reinforcement learning done by fine-tune re-weighting is well detailed in PPO and RLHF papers before OpenAI went closed.
Thanks. By prompt tuning, I meant prefix tuning, P-tuning, etc., which only fine-tune a soft prompt instead of updating model weights.
In the doc, OpenAI claims:
To fine-tune a model, you are required to provide at least 10 examples. We typically see clear improvements from fine-tuning on 50 to 100 training examples.
This amount of data is usually considered insufficient for finetuning (including LoRA) but makes sense for prompt tuning.
Ahh, understood, well, 3.5 fine tuning was quite some time in the making, potentially both I guess. It would be very interesting to be a fly on the wall.
I read somewhere that it’s done using prompt tuning, P-tuning.
Thanks cbora. Could you share me with the url?
This paper has detailed descriptions of the methods OpenAI researchers used to train models for a specific task, given that they have completely open abilities to the model and layers.
Fine tuning being a refined product, you have the end user scenario in this paper’s nomenclature of label:demonstration pairs being used for behavior cloning (supervised) and the model weighted by PPO rewarding.
Thanks for sharing, but I think that does not disclose any information about the latest finetuning service for GPT3.5-Turbo.
Now I think it is safe to assume they use prompt tuning because the data required are so little and the tuning time is so short.