Help with CLI data preparation tool

Hi,
I’m trying to prepare a TXT file for fine-tuning.
For that, I have pairs of prompts and completions.
I tried to use the CLI data preparation tool to save time, but no matter what I do, the CLI data preparation tool takes everything as “completion” only without prompts.
I tried different lines, separators, adding the words “prompt” and “completion,” and even when I provide the actual format, it changes it to be only the Completion.
This is mine:
{“prompt”:" Human: How do I use my favorite navigation app?\nAI:, “completion”:" You can change the default navigation app at the system’s settings at the navigation settings section."}

Here’s what comes out:
{“prompt”:"",“completion”:" {“prompt”:" Human: How do I use my favorite navigation app?\nAI:, “completion”:" You can change the default navigation app at the system’s settings at the navigation settings section."}"}

Any ideas?

1 Like

Change the extension of your file from .txt to .jsonl, and use the suggested format - that is the easiest way to get it to work. The text option is only used when providing unformatted files, where everything is assumed to be completion

Thanks, what should be the format to get the CLI tool to realize what the “prompt” is and what is the “completion”?

Alternatively you can try to save a csv file, which contains two columns, one named “prompt”, and another named “completion”, in case that’s easier.

1 Like

Thanks, I finally got it to work.
I copied your example, and for some reason, it caused me encoding issues, so I eventually copied the text from the guide into a JSON file as follows:
{“prompt”: “”, “completion”: “”}
However, I think the CSV file format would definitely be an easier approach for large chunks of texts and I’m going to give it a try.

1 Like