Get does not appear to be in valid JSON format. Please ensure your file is formatted as a valid JSON file. every time

I am trying to finetune the model but everytime gets ERROR in read_any_format validator: Your file data.json does not appear to be in valid JSON format. Please ensure your file is formatted as a valid JSON file.. I have checked the file it is in valid format.

1 Like

How have you checked the file? What validator?

Can you show a few lines of your dataset?

We need more info to be able to help.

Thanks!

3 Likes

Hi there, I’m getting the same error. I validated the JSON file using https://jsonlint.com/ and https://codebeautify.org/jsonvalidator.

See below a sample from the dataset:

[{
“prompt”: “1AOFJE36”,
“completion”: “Can assist firms in organizational challenges”
},
{
“prompt”: “1AOFJE35”,
“completion”: “Offers assistance to growing companies with their sales”
},
{
“prompt”: “1AOFJE34”,
“completion”: “Specializes in supporting start-ups with their communication and public relations strategy”
}]

Hi @banditodegretta

Your data is not in JSONL format.

Please check the JSONL formatting specs and try again.

:slight_smile:

https://manifold.net/doc/mfd9/jsonl.htm

Not OP, but I am using openai tools fine_tunes.prepare_data which is meant to convert JSON to JSONL as far as I have understood, but I get the same error.

It is a valid JSON though, the RFC 8259 validator on https://jsonformatter.curiousconcept.com/ says its valid. VSCode does not show any error. It is a perfectly fine JSON structured like the input is supposed to be structured:

[
  {
    "prompt": "this is an example and not the real json",
    "completion": "ok"
  },
  {
    "prompt": "is this a valid json?",
    "completion": "yeah"
  }
]
1 Like

I downgraded to 0.25.0 and it worked. Seems like an issue with the latest version.

Hey @gryt what exactly did you downgrade and how? I get the same error…

This data above is not properly formatted JSONL. Your data above is JSON formatted. The requirement is JSONL and NOT JSON.

There is no array brackets, and there are NO commas between lines and each JSONL entry MUST be standalone on one single like, like this:

Below is JSONL format (no array brackets, no command between JSON objects)

{“prompt”: “1AOFJE36”,“completion”: “Can assist firms in organizationalchallenges”}
{“prompt”: “1AOFJE35”,“completion”: “Offers assistance to growingcompanies with their sales”}
{“prompt”: “1AOFJE34”,“completion”: “Specializes in supporting start-ups with their communication and public relations strategy”}

Hope this helps.

:slight_smile:

1 Like

The above data is valid JSON not JSONL. You cannot send validated JSON data to a fine-tuning process. The fine-tuning will fail. The requirement is for JSONL data, not JSON.

Your data must be JSONL formatted, as follows:

{"prompt": "this is an example and not the real json", "completion": "ok"}
{"prompt": "is this a valid json?","completion": "yeah" }

The requirement for fine-tuned data is JSONL, and not JSON.

Hope this helps.

:slight_smile:

Please read what people are saying before trying to help them. fine_tunes.prepare_data does NOT expect JSONL - it converts other files INTO JSONL automatically and performs additional improvements on the file.

1 Like

If you are using the CLI like I am and you are trying to use “fine_tunes.prepare_data” then downgrade the CLI version to 0.25.0 with pip install openai==0.25.0

Sorry, but the OP did not specify the problem was because of that poorly written CLI which I (and most devs here) never use.

But it does seem that most here who do use the CLI are constantly having issues.

Sorry again, but I never use the CLI and do not recommend it either.

Take care and good luck @gryt :four_leaf_clover:
:slight_smile:

I’m new here & I’ve been trying to figure out how to train the model on my own data.

If you’re not using the CLI, what are you guys using instead?

I get an error, but its saying “Does not appear to be in valid JSONL format.”

It’s definitely valid JSONL format.

Is the error wrong? Is my formatting wrong (it’s not)? Is something wrong with the API?

I swear to god there has not been a single step of using this API that has gone smoothly. Fmtgdt.