Help needed regarding Fine tuning

aalrabah1 · April 6, 2024, 4:36am

I’m currently facing an issue while trying to fine-tune a model with a custom training file. Despite following the documentation and ensuring the JSONL format seems correct, I keep encountering the following error message:

The job failed due to an invalid training file. Invalid file format. Example 29 At least one message must be from the assistant.

Each entry in my JSONL file follows the prescribed structure, with messages attributed to “system”, “user”, and “assistant” roles. Here’s a generalized example of the format I’m using:

{
“messages”: [
{“role”: “system”, “content”: “Description of the assistant’s capabilities.”},
{“role”: “user”, “content”: “A user query.”},
{“role”: “assistant”, “content”: “The assistant’s response.”}
]
}
Despite this, I receive an error indicating an issue with Example 29, suggesting it lacks an assistant message, which isn’t the case upon inspection. I’ve checked for common JSON formatting errors, ensured UTF-8 encoding, and validated my JSONL structure with online tools, all without identifying the root cause.

I even made a Python script to check each entry, and everything seems fine on my end. The format’s correct, and I made sure of UTF-8 encoding.

Has anyone seen this before or have tips? It’s confusing because the error keeps changing which example it points to.

jr.2509 · April 6, 2024, 6:24am

Welcome to the community!

The general structure of your training data set looks fine. For any of the examples it points to for the error could it be that you included double quotation marks inside any of the system, user and assistant messages besides those at the beginning and end? Could there be a bracket missing somewhere?

_j · April 6, 2024, 8:35am

What you are showing is not the final format of a JSONL file for fine-tuning.

Every complete conversation example must be on a single very long line.

The file, if separated by line feed characters (\n), would then have single-line JSON.

You can see if that helps.

Innovatix · April 6, 2024, 9:08am

The Data Should be formatted in this format - as docs:

{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already."}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?"}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters."}]}

Learn more about Fine-tuning
https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset

Topic		Replies	Views
Issues with JSON assistant message in fine-tuning API fine-tuning	12	561	October 7, 2024
I am getting an invalid_request_error while creating Fine tuning job for GPT 3.5 turbo via API API gpt-35-turbo , fine-tuning	1	1998	August 27, 2023
Can someone help me (with fine-tuning) API fine-tuning , api , help-needed	13	2550	April 6, 2024
How to structure fine tuned data API	9	4767	December 23, 2023
An error occurred while processing file 'file-name' and it cannot be used for fine-tuning. Details may be available in the file's status_details API fine-tuning , fine-tuning-problems	6	1917	September 18, 2023

Help needed regarding Fine tuning

Related topics