Hi there!
I am currently trying to generate (programmatically) a dataset file (.jsonl), which I want to use for fine tuning a GPT-3 model.
The output that is being generated currently looks like this:
[{"prompt":"Some input text", "completion":"Some completion text"}, {"prompt":"Another input text", "completion":"Another completion text"}]
In the documentation, I see that all examples is without starting brackets [].
E.g.:
{"prompt":"Company: BHFF insurance\nProduct: allround insurance\nAd:One stop shop for all your insurance needs!\nSupported:", "completion":" yes"}
{"prompt":"Company: Loft conversion specialists\nProduct: -\nAd:Straight teeth in weeks!\nSupported:", "completion":" no"}
Now my question is: Do I need to remove the surrounding array brackets ([]) from my dataset.jsonl file before using it to fine-tune?
I just dump them one DICT at a time. You can see one of my scripts here AutoMuse2/format_jsonl.py at main · daveshap/AutoMuse2 · GitHub
Thanks, Dave! I’m currently generating the training data using PHP.
It’s not a problem for me to structure the output like the documentation is referring to. Just wanted to hear if it had any impact of output/accuracy.
yes, you need to remove the [] otherwise it will break as far as I know
1 Like