Get does not appear to be in valid JSON format. Please ensure your file is formatted as a valid JSON file. every time

I am trying to finetune the model but everytime gets ERROR in read_any_format validator: Your file data.json does not appear to be in valid JSON format. Please ensure your file is formatted as a valid JSON file.. I have checked the file it is in valid format.

1 Like

How have you checked the file? What validator?

Can you show a few lines of your dataset?

We need more info to be able to help.

Thanks!

3 Likes

Hi there, I’m getting the same error. I validated the JSON file using https://jsonlint.com/ and https://codebeautify.org/jsonvalidator.

See below a sample from the dataset:

[{
“prompt”: “1AOFJE36”,
“completion”: “Can assist firms in organizational challenges”
},
{
“prompt”: “1AOFJE35”,
“completion”: “Offers assistance to growing companies with their sales”
},
{
“prompt”: “1AOFJE34”,
“completion”: “Specializes in supporting start-ups with their communication and public relations strategy”
}]

Hi @banditodegretta

Your data is not in JSONL format.

Please check the JSONL formatting specs and try again.

:slight_smile:

https://manifold.net/doc/mfd9/jsonl.htm

Not OP, but I am using openai tools fine_tunes.prepare_data which is meant to convert JSON to JSONL as far as I have understood, but I get the same error.

It is a valid JSON though, the RFC 8259 validator on https://jsonformatter.curiousconcept.com/ says its valid. VSCode does not show any error. It is a perfectly fine JSON structured like the input is supposed to be structured:

[
  {
    "prompt": "this is an example and not the real json",
    "completion": "ok"
  },
  {
    "prompt": "is this a valid json?",
    "completion": "yeah"
  }
]
1 Like

I downgraded to 0.25.0 and it worked. Seems like an issue with the latest version.

Hey @gryt what exactly did you downgrade and how? I get the same error…

This data above is not properly formatted JSONL. Your data above is JSON formatted. The requirement is JSONL and NOT JSON.

There is no array brackets, and there are NO commas between lines and each JSONL entry MUST be standalone on one single like, like this:

Below is JSONL format (no array brackets, no command between JSON objects)

{“prompt”: “1AOFJE36”,“completion”: “Can assist firms in organizationalchallenges”}
{“prompt”: “1AOFJE35”,“completion”: “Offers assistance to growingcompanies with their sales”}
{“prompt”: “1AOFJE34”,“completion”: “Specializes in supporting start-ups with their communication and public relations strategy”}

Hope this helps.

:slight_smile:

2 Likes

The above data is valid JSON not JSONL. You cannot send validated JSON data to a fine-tuning process. The fine-tuning will fail. The requirement is for JSONL data, not JSON.

Your data must be JSONL formatted, as follows:

{"prompt": "this is an example and not the real json", "completion": "ok"}
{"prompt": "is this a valid json?","completion": "yeah" }

The requirement for fine-tuned data is JSONL, and not JSON.

Hope this helps.

:slight_smile:

1 Like

Please read what people are saying before trying to help them. fine_tunes.prepare_data does NOT expect JSONL - it converts other files INTO JSONL automatically and performs additional improvements on the file.

2 Likes

If you are using the CLI like I am and you are trying to use “fine_tunes.prepare_data” then downgrade the CLI version to 0.25.0 with pip install openai==0.25.0

Sorry, but the OP did not specify the problem was because of that poorly written CLI which I (and most devs here) never use.

But it does seem that most here who do use the CLI are constantly having issues.

Sorry again, but I never use the CLI and do not recommend it either.

Take care and good luck @gryt :four_leaf_clover:
:slight_smile:

1 Like

I’m new here & I’ve been trying to figure out how to train the model on my own data.

If you’re not using the CLI, what are you guys using instead?

I get an error, but its saying “Does not appear to be in valid JSONL format.”

It’s definitely valid JSONL format.

Is the error wrong? Is my formatting wrong (it’s not)? Is something wrong with the API?

I swear to god there has not been a single step of using this API that has gone smoothly. Fmtgdt.

The problem is the read_any_format function in the openai cli. If you provide your input without indentation it will work. I had the same problem.

So instead of

[
    {
        "prompt": "your prompt",
        "completion": "your completion"
    }
]

do…

[{"prompt": "your prompt","completion": "your completion"}]
2 Likes

Der obige Code ist gĂĽltiges JSON, aber nicht gĂĽltiges JSONL. Seien Sie vorsichtig beim Gebrauch von CLIs und API-Wrappern, die die API-Anforderungen verbergen.

1 Like

I solved this issue by changing the file encoding, converting the .JSONL file from UTF-8 with BOM to UTF-8, using VSCODE (encoding option is in the bottom right).

I have openai CLI version 0.27.5, and was using the fine_tunes.prepare_data command.

2 Likes