Fine-tuning problem, multiple completion

I am preparing jsonl

I want to train the model to use the prefix of the text to predict the sentence I want to type

For example

“The weather is nice today”

So when I type “t”, “w”, “i”, “n”, “t”

I hope he can answer me “The weather is nice today”

But it could also be “That’s why I need teamwork”

So I prepared the dataset as below

{“prompt”:“t w i n t”,“completion”:“The weather is nice today”}
{“prompt”:“t w i n t”,“completion”:“That’s why i need teamwork”}

Is this preparation in the right direction?

Or is there something I need to modify?

No. Your data is JSONL compliant but it does not meet the OpenAP data formatting requirements for fine-tuning.

Reference:

Preparing Your Dataset

See Also:

1 Like