Help needed regarding Fine tuning

I’m currently facing an issue while trying to fine-tune a model with a custom training file. Despite following the documentation and ensuring the JSONL format seems correct, I keep encountering the following error message:

The job failed due to an invalid training file. Invalid file format. Example 29 At least one message must be from the assistant.

Each entry in my JSONL file follows the prescribed structure, with messages attributed to “system”, “user”, and “assistant” roles. Here’s a generalized example of the format I’m using:

“messages”: [
{“role”: “system”, “content”: “Description of the assistant’s capabilities.”},
{“role”: “user”, “content”: “A user query.”},
{“role”: “assistant”, “content”: “The assistant’s response.”}
Despite this, I receive an error indicating an issue with Example 29, suggesting it lacks an assistant message, which isn’t the case upon inspection. I’ve checked for common JSON formatting errors, ensured UTF-8 encoding, and validated my JSONL structure with online tools, all without identifying the root cause.

I even made a Python script to check each entry, and everything seems fine on my end. The format’s correct, and I made sure of UTF-8 encoding.

Has anyone seen this before or have tips? It’s confusing because the error keeps changing which example it points to.

Welcome to the community!

The general structure of your training data set looks fine. For any of the examples it points to for the error could it be that you included double quotation marks inside any of the system, user and assistant messages besides those at the beginning and end? Could there be a bracket missing somewhere?

1 Like

What you are showing is not the final format of a JSONL file for fine-tuning.

Every complete conversation example must be on a single very long line.

The file, if separated by line feed characters (\n), would then have single-line JSON.

You can see if that helps.

The Data Should be formatted in this format - as docs:

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters."}]}

Learn more about Fine-tuning :point_down: