The old method of fine tuning took prompt completion pairs:
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
Whereas it now takes a list of messages:
{"messages": [{"role": "system", "content": "Marv is a factual..."}...]
Maybe this is self evident, but in this latest version, are we supposed to segment the messages ourselves?
What I mean: a completed conversation is one list. In the new format, do I upload that list a single time?
In the old version, I would segment the conversation such that each assistant turn was a completion
and the prompt
was everything before that turn.
In this new version, it’s not explicitly clear what is happening. My interpretation is that I should not segment a conversation. But this is me reading in between the lines.
Does anyone have opinions, docs, or data on this?