Validate JSONL Training File


I have written a php connector to the api. Training works well, but not with all files.
After creating the finetuned job, I get this when I request the status of the current training: “The job failed due to an invalid training file”
I use gpt-3.5-turbo-0613. Linefeeds are correct, double quotes ecaped, etc.
The jsonl looks valid, according to php storm’s json validation function.

Are there any other tools, or scripts, to validate my jsonl? The message that I get from the api is not very clear, and doesn’t mention the line, that causes the problems.


Might be worth posting at least a sample of the training set, might be a common issue on every line.

1 Like


If you need a general approach I’ll suggest this example from the cookbook:


It’s propably this string: Und so funktioniert\'s, especially that part: \'s.
how to escape that correctly?

The file should have all strings in double-quotes, thus no need to escape an apostrophe.

The data is JSON, not python autoquotes.

This code should crash nicely on bad json: