Attempting to fine tune curie. I submit the .jsonl file and api goes from pending to failed but does not give a reason for the failure. Now I’m not sure what to do – do I keep trying?
‘updated_at’: 1674912661,
‘validation_files’: },
{‘created_at’: 1674913222,
‘fine_tuned_model’: None,
‘hyperparams’: {‘batch_size’: None,
‘learning_rate_multiplier’: None,
‘n_epochs’: 4,
‘prompt_loss_weight’: 0.01},
‘id’: ‘ft-AKZ9a6vX8oSz9gjadD687pFB’,
‘model’: ‘curie’,
‘object’: ‘fine-tune’,
‘organization_id’: ‘org-DjoSJnnXM2aP8lMapNzkRlmz’,
‘result_files’: ,
‘status’: ‘failed’,
‘training_files’: [{‘bytes’: 4116449,
‘created_at’: 1674913222,
‘filename’: ‘file’,
‘id’: ‘file-YJDBWDO71yFPZYurFX9JuxsW’,
‘object’: ‘file’,
‘purpose’: ‘fine-tune’,
‘status’: ‘processed’,
‘status_details’: None}],
‘updated_at’: 1674913270,
‘validation_files’: }],
‘object’: ‘list’}
Did you check your JSONL file for errors? Are the keys correct?
Example
{"prompt": "<prompt text> \n\n###\n\n", "completion": " <ideal generated text> #####"}
{"prompt": "<prompt text> \n\n###\n\n", "completion": " <ideal generated text> #####"}
{"prompt": "<prompt text> \n\n###\n\n", "completion": " <ideal generated text> #####"}
See Also:
Reference:
Preparing your dataset
It’s a large file but I did format it the way you are showing it. Is there an online services somewhere the validates .jsonl?
Not that I know of. All my prior searches harvests only JSON validators.
As you know, these JSON validators will not validate JSONL, but maybe someone else knows of one which works well for JSONL?
Sorry, I have searched before and also today, and came up empty.
Thanks for your help. I’d did a quick visual check of the jsonl and I don’t see any issues. I tried running the fine tune again and same result (it fails but gives no reason for the failure).
Now I don’t know what to do except try a small portion of the jsonl (but this is going to get expensive) if I have to step through the jsonl to find the error…
Yeah, this is a problem many are experiencing.
No JSONL validator specific for OpenAI Fine-Tunings
Immature (beta) error messages from the API with are not very helpful.
In my “still working on it” OpenAI Lab app, I was planning on writing my own validator, but it’s further down the dev path timeline. Still working out the kinks in the workflow…
Anyway, I think a basic JSONL validator for Fine Tuning can be accomplished using a basic REGEX.
Maybe something like this on a line by line basis in a loop (not fully tested), or you can alter it as you like (needs tweaking, will work on it later this week):
/^\{"prompt":\s*"([^"]+)",\s*"completion":\s*"([^"]+)"\s*\}$/gm
@aydengray2020
Just a quick check of the REGEX above from the Ruby console:
irb(main):022:0>string='{"prompt": "Hello", "completion": "World"}'
=> "{\"prompt\": \"Hello\", \"completion\": \"World\"}"
irb(main):024:0> /^\{"prompt":\s*"([^"]+)",\s*"completion":\s*"([^"]+)"\s*\}$/.match?(string)
=> true
Of course, this REGEX needs more tweaking if you want to account for the details in the link and image above, but it’s a start.
OK @aydengray2020 , maybe you can try something like this to get started until we come up with something better.
Here is an example from the Ruby console:
irb(main):089:0>
irb(main):090:1* def validate(fine_tune_data)
irb(main):091:2* if fine_tune_data.present?
irb(main):092:2* count = 0
irb(main):093:3* fine_tune_data.split("\r\n").each do |line|
irb(main):094:3* count = count + 1
irb(main):095:4* if /^\{"prompt":\s*"([^"]+)",\s*"completion":\s*"([^"]+)"\s*\}$/.match?(line)
irb(main):096:4* puts "LINE ##{count} VALID JSONL: #{line}"
irb(main):097:4* else
irb(main):098:4* puts "LINE ##{count} INVALID JSONL: #{line}"
irb(main):099:3* end
irb(main):100:2* end
irb(main):101:2* else
irb(main):102:2* return false
irb(main):103:1* end
irb(main):104:0> end
=> :validate
irb(main):105:0> string='{"prompt": "Hello", "completioan": "World"}\n{"prompt": "Hello", "completion": "World"}'
=> "{\"prompt\": \"Hello\", \"completioan\": \"World\"}\\n{\"prompt\": \"Hello\", \"completion\": \"World\"}"
irb(main):106:0> validate(string)
LINE #1 INVALID JSONL: {"prompt": "Hello", "completioan": "World"}
LINE #2 VALID JSONL: {"prompt": "Hello", "completion": "World"}
=> ["{\"prompt\": \"Hello\", \"completioan\": \"World\"}", "{\"prompt\": \"Hello\", \"completion\": \"World\"}"]
irb(main):107:0>
Not perfect, but I’m going to use something similar to this when I write my JSONL validation method.
Will test this further later… sure it needs tweaking
def validate(fine_tune_data)
if fine_tune_data.present?
count = 0
fine_tune_data.split("\r\n").each do |line|
count = count + 1
if /^\{"prompt":\s*"([^"]+)",\s*"completion":\s*"([^"]+)"\s*\}$/.match?(line)
puts "LINE ##{count} VALID JSONL: #{line}"
else
puts "LINE ##{count} INVALID JSONL: #{line}"
end
end
else
return false
end
end
Hope this helps.
It turns out I did not have enough credits to complete the fine tuning. I increased my usage limits, resent the jsonl and it processed successfully. I assumed ‘lack of credits’ would be a common issue and would throw a known error, but it doesn’t.
3 Likes
Great to hear you figured it out @aydengray2020
You motivated me to write this validation method, so maybe you might find it useful someday if you have a similar problem with JSONL and fine-tuning.
Lots of folks have been frustrated with time-consuming fine-tuning failures and confusing API returned error message after a failure, so I hope this helps a few people at least.
Today I wrote this Ruby method which will validate JSONL and also optionally validate JSONL with the OpenAI API fine-tuning requirements, summarized in the reference below.
Everyone is welcome to test and modify this method, translate the method to your favorite programming language, or post back with suggested improv…
Thanks for the weekend motivation to bang out some code.
Thanks,
I’ll be doing another jsonl test in the coming week and will use your code. I’ll let you know how it goes.
Ayden
1 Like