Why fine-tuning jsonl file validation gpt-3.5-turbo-0125 can fail? Are there any logs to check?

Are you receiving an error simply when uploading and then monitoring the progress of file verification?

Here’s basic code to check token counts. It also uses JSON which will fail if a line is not valid JSON.

Change the max_line (value 52 specifically to produce errors) to something smaller than your model context length. Glancing through, I think the overhead per message also should have been 4 instead of 3.

There’s other characters that seem to be refused in the past: bytes above 128 and above ASCII (mostly accented characters) may be better trained after fully-converted to UTF-8.