I was surprised to see that the language models fine tuning is based on providing prompt-completion. My understanding was the the base models available for fine tuning (Davinci, Curie, Babbage, Ada) on trained using a next word prediction task (causal/autoregressive language modeling). For that type of training, you would expect the input data to be a list of texts, not text pairs. Presumably loss is being computed over the completion tokens only. That seems a bit inefficient. You can already see this in the “Case study: Customer support chatbot” example. Previous conversation messages are repeated across different prompts. It seems it would be better in that case to provide a list of conversations as the input (less expensive for the API user). But perhaps they are doing something else for fine tuning that I didn’t expect.
Does anyone on here know more about what the approach they are taking for fine tuning?