I am trying to fine-tune Chatgpt 3.5 with my prepared jsonl file. It went pretty well when I trained it with 20 examples, but it failed everytime with the file of 320 examples. It shows “The job failed due to an invalid training file”, but I couldn’t see anything wrong with my file. Does anyone know the reason?
Or is there a token limit for the training file? Need some help.
The finetuning job has a token limit of 50 million tokens allowed, so I’m sure that is not the case for it.
There is an code given by OpenAI that validates training files and return whether the JSONL structure is valid or not. Have you used that ?
A training file can fail on unexpected things, such as characters in bytes 128-255 that normally would not need to be escaped. Also pesky could be characters like tabs within your json. Multi-line inputs are right out (unlike several OpenAI fine-tune examples); every training conversation must be on one line, with escaped quotes and escaped newlines within json strings.
I hope that gives you the ability to look deeper into your file and perform more validations on the input you provide.