Can someone pls validate my training jsonl file for fine tuning?

I am using following files to fine tune Curie model but its getting failed with no reason, can someone pls have a look at my training files and validate if they are correct ? pls find below the log and also the jsonl files.

{"prompt":"Someone has taken the toilet seat, it is missing from the guys bathroom. ->","completion":" Plumbing"}
{"prompt":"need monitors ->","completion":" AV-Equipment"}
{"prompt":"New brand radio system required ->","completion":" AV-Equipment"}
{"prompt":"septic tank overflowing ->","completion":" Plumbing"}
{"prompt":"Poo on the toilet seat ->","completion":" Plumbing"}
{"prompt":"Sewage chamber blocked ->","completion":" Plumbing"}
{"prompt":"flooded branch ->","completion":" Plumbing"}
{"prompt":"Voice controls not working ->","completion":" AV-Equipment"}
{"prompt":"TV's to be moved ->","completion":" AV-Equipment"}
{"prompt":"fit projector to the ceiling ->","completion":" AV-Equipment"}
{"prompt":"Microphones are echoing ->","completion":" AV-Equipment"}
{"prompt":"Issue with water pressure ->","completion":" Plumbing"}

error log -
“id”: “ft-Y1ZDAcaGytn8zwPv0flNoxLF”,
“model”: “curie”,
“object”: “fine-tune”,
“organization_id”: “org-esS5wu4sjqoLntKEKVIWKRYb”,
“result_files”: ,
“status”: “failed”,
“training_files”: [
{
“bytes”: 12216,
“created_at”: 1677337282,
“filename”: “.\plumbing_prepared_train (1).jsonl”,
“id”: “file-fgkdlKNGBDz0SVz1oxpciFG8”,
“object”: “file”,
“purpose”: “fine-tune”,
“status”: “processed”,
“status_details”: null
}
],
“updated_at”: 1677337774,
“validation_files”: [
{
“bytes”: 3038,
“created_at”: 1677337284,
“filename”: “.\plumbing_prepared_valid (1).jsonl”,
“id”: “file-t9r0oPnxWhcjntOQHQGQJpLm”,
“object”: “file”,
“purpose”: “fine-tune”,
“status”: “processed”,
“status_details”: null
}
]
},
{
“created_at”: 1677499304,
“fine_tuned_model”: null,
“hyperparams”: {
“batch_size”: null,
“classification_n_classes”: 3,
“compute_classification_metrics”: true,
“learning_rate_multiplier”: null,
“n_epochs”: 4,
“prompt_loss_weight”: 0.01
},
“id”: “ft-CvMKTZQne0xHWZeYXKEWRSyw”,
“model”: “curie”,
“object”: “fine-tune”,
“organization_id”: “org-esS5wu4sjqoLntKEKVIWKRYb”,
“result_files”: ,
“status”: “failed”,
“training_files”: [
{
“bytes”: 22180,
“created_at”: 1677499302,
“filename”: “.\ibm_intent_prepared_train.jsonl”,
“id”: “file-BjgFXS9XcltrEPKQz123nICO”,
“object”: “file”,
“purpose”: “fine-tune”,
“status”: “processed”,
“status_details”: null
],
“updated_at”: 1677499837,
“validation_files”: [
“bytes”: 5851,
“created_at”: 1677499304,
“filename”: “.\ibm_intent_prepared_valid.jsonl”,
“id”: “file-UE0rHdx7gGEmblkrjxpBOjnt”,
“object”: “file”,
“status”: “processed”,
}
]

The first thing is that your JSONL data, each line, is missing a stop at the end of each completion value.

However, that should not cause the FT process to fail.

Let me try… hold on.

HTH

:slight_smile:

Update (pending):

1 Like

Here ya go @sourabhsardesai40

Succeeded

Setup, your exact data:

HTH

:slight_smile:

Note:

Note… I tested your prompt against your fine tuned model and it has problems because there is no stop.

Also, you probably need to increase n_epochs to get a better model fit.

HTH

:slight_smile:

Update:

So, I’m now re-finetuning your data as follows (with a stop) for 12 n_epochs, just for you @sourabhsardesai40

12 epochs will take some time…

:slight_smile:

1 Like

Thank you so much @ruby_coder :heart:

1 Like

@ruby_coder also can u pls let me know which tool are you using to create the jsonl files ? the one which is in screenshot

Sure. It is a Ruby on Rails project I created (that runs on localhost on my workstation) to help people like you in our community who have problems.

Going to have dinner now. Back in a few hours.

See:

1 Like

great thanks i will explore it by the time you will be back :slight_smile:

By any chance this worked ? just checking :slight_smile:

The cake has baked:

Testing completion now.

Yes, it works:

Prompt Setup 1

Completion 1

Prompt Setup 2

Completion 2

So, “there ya go” @sourabhsardesai40. 100% expected results based on your original training data.

It’s just a matter of understanding how to set up your training file correctly (in your case you were missing stops as I pointed out to you in my initial response), making sure you meet the OpenAI API fine-tuning dataset formatting requirements and you run your model (in an iterative manner) and “actually fine tune it” to get the model fit that your use case requires.

You may also find these two “hands on, lab” tutorials (with data scientist comments) helpful:

  1. Fine-Tuning In a Nutshell with a Single Line JSONL File and n_epochs

  2. Model Fitting In a Nutshell with a Single Line JSONL File and n_epochs

Best of luck @sourabhsardesai40 and hope this helped you and you learned something new today.

:slight_smile:

1 Like

Thank you so much @ruby_coder it worked at my side as well. thank you once again for your guidance :heart:

1 Like