Failing to fine-tune a model, invalid jsonl format no matter what

Hi,

I’m trying to fine tune a model, but for some reason, it always fails and tells me that my file does not appear to be in valid JSON format.

I have tried both .jsonl and .json, but both fail. I am running the following:

openai tools fine_tunes.prepare_data -f my_file.json

Here are a few examples that I tried, the only difference is formatting, hopefully that will be reflected here:

{
    "prompt": "Hi, I can't connect to the board. Can you please help? I'm in SS4. Thank you, Adam", 
    "completion": "Hello Adam, We will try to get someone to see you as quickly as possible, but in the mean time, please try the following: Make sure that the docking station is receiving power. You can tell by the small light visible on the docking station after you connect your device to it."
}

and

{
    "prompt": "Hi, I can't connect to the board. Can you please help? I'm in SS4. Thank you, Adam", 
    "completion": "Hello Adam, We will try to get someone to see you as quickly as possible, but in the mean time, please try the following: Make sure that the docking station is receiving power. You can tell by the small light visible on the docking station after you connect your device to it."
}

and

{"prompt": "Hi, I can't connect to the board. Can you please help? I'm in SS4. Thank you, Adam", "completion": "Hello Adam, We will try to get someone to see you as quickly as possible, but in the mean time, please try the following: Make sure that the docking station is receiving power. You can tell by the small light visible on the docking station after you connect your device to it."}

Any ideas what’s wrong with the file?

You must use .jsonl extension.

According to my “under construction” validator, two of your three JSONL files are not valid.



Also, the JSONL spec is for each entry to be on a single line, so that makes sense. Two of your attempts span four lines, so those lines will not validate.

Maybe try the validated (single line) JSONL file and use the correct .jsonl extension and see how it goes?

HTH

Thank you for the reply

Please see the following example of the file, this time saved as .jsonl and I made sure that it only spans a single line. The image also shows the output of the terminal:

Here is the prompt copy-pasted from the file as is:

{"prompt": "Hi, I can't connect to the board. Can you please help? I'm in SS4. Thank you, Adam", "completion": "Hello Adam, We will try to get someone to see you as quickly as possible, but in the mean time, please try the following: Make sure that the docking station is receiving power. You can tell by the small light visible on the docking station after you connect your device to it."}

I wonder if it perhaps dislikes my environment. I’m on an older version of Win 10, creating the file via VSCode and running the fine-tuning via Git Bash.

My experience is that it is best (less errors and headaches) to use the full path to the file in your File API call.

/the/full/path/to/your/file/data_train.jsonl

The API error message are beta, hit or miss (mostly miss), so I discovered this “the hard way”.

Here is how I do it…as an example:

module Files
    def self.get_client
        Ruby::OpenAI.configure do |config|
            config.access_token = ENV.fetch('OPENAI_API_KEY')
        end
        client = OpenAI::Client.new
    end


    def self.upload(filename="#{Rails.root}/app/assets/files/fine-tune.jsonl",purpose='fine-tune')
        client = get_client
        response =client.files.upload(
            parameters: {
                    file: filename,
                    purpose: purpose,
                })
        file_id = JSON.parse(response.body)["id"]
        file_id 
    end
end

Footnote

Please note @Arivald, even if your data passes JSONL validation, you must use a specific format in your text for fine-tuning. Your current JSONL text will pass JSONL but it will does not “pass” (there is no validator for this, but I have one), for the API requirements.

Most Developers Seem to Miss this Requirement

Reference:

Preparing You Dataset

Here is an example of a fully “API Validated” entry. Note I use “PROMPT_SEPARATOR” and “STOPSTOP” in this example. You can choose whatever you like.

1 Like

Hi @ruby_coder, could you also look at my dataset? was looking around why my jsonl file is not being accepted as a valid jsonl file. And you seem to be answering all the questions

OK. Please post here using Markdown triple backticks like this:

```
# your json data here
```

I will try to test for you in between sanding sessions, hand sanding my teak wood floor (almost finished after a year, yay!) today.

Coding is fun. Sanding a super hard teak-wood floor with a 5" random orbital sander, by hand over a year, that is not fun.

HTH

:slight_smile:

I replaced some part of the text with “…” because it is very long.

{"prompt": "You are a water expert and rewrite the product description to state what it is, what does it do and how it works. Keep it technical and factual, don’t use sales language or too many adjectives. Write in third person view.\nProduct name: SigaPlatform\nProduct description: SigaPlatform is an AI-driven predictive maintenance and cyber security platform, designed to protect critical industrial assets (pumps, valves, etc) at the operational technology (OT) level, ...  to equipment, people or the environment.\n\n###\n\n", "completion": "SigaPlatform is an AI-driven cyber security  ... and enabling full regulatory compliance.\n####"},
{"prompt": "You are a water expert and rewrite the product description to state what it is, what does it do and how it works. Keep it technical and factual, don’t use sales language or too many adjectives. Write in third person view.\nProduct name: Mobile Organic Biofilm (MOB) Process\nProduct description: The Mobile Organic Biofilm, or the MOBTM, is Nuvoda's proprietary ... time needed for retrofit.\n\n###\n\n", "completion": "The Mobile Organic Biofilm (MOB) process increases ... aerobic granular sludge (AGS).\n####"}

Do I have to put every thing into a “”? Is there suppose to be a comma after every fine tune data?
Sorry I’m a new programmer and was thrown into learning AI.

No. JSONL does not have commas between the hashes and there are no brackets required to designate an array.

Checking now before I get back to sanding…

Hold on.

Hi @dc.vistro

Your JSONL data will not validate because you have a comma at the end of your line(s).

There are no commas at the end of a JSONL line.

JSONL validation is different than JSON validation.

Hope the helps

:slight_smile:

Note, if I remove the errant JSONL EOL comma. your data validates JSONL-wise; and if I add a space at the beginning of your completion, it will pass OpenAI fine-tuning validation also:

Thank you very much @ruby_coder. I will try your suggestion.

You are welcome, @dc.vistro

I will not be around for most of the rest of the day (heading to gym), so if you run into more issues, please review this post here in our community:

:slight_smile:

I’m using,

openai tools fine_tunes.prepare_data -f (file location)

edited the Jsonl file as you suggested and i’m still getting

ERROR in read_any_format validator: Your file (file location) does not appear to be in valid JSONL format. Please ensure your file is formatted as a valid JSONL file.

Sorry, @dc.vistro, I deleted the openai CLI tools less than an hour after installing and testing it.

So I cannot help you WRT the CLI.

I am sure others can help you use the CLI much better than me!

:slight_smile: :slight_smile:

it’s alright. Thank you with your help.

You are welcome.

You will get a similar error as you have experienced if your code does not point to a valid file location.

The OpenAI errors messages are often obscure and can be misleading due to the “beta” nature of the API release.

:slight_smile:

OBTW, the correct CLI syntax is:

openai tools fine_tunes.prepare_data -f <LOCAL_FILE>

So you should consider insuring the path to your <LOCAL_FILE> is correct.

See:

CLI data preparation tool