Ah, thanks for checking this, I actually checked the JSONL format with the following python script and it didn’t get any error
import json
def validate_jsonl_file(file_path):
with open(file_path, 'r') as f:
lines = f.readlines()
for line_num, line in enumerate(lines):
try:
json_obj = json.loads(line)
print("prompt-->", json_obj["prompt"])
print("completion-->", json_obj["completion"])
except ValueError as e:
print(f"Error parsing line {line_num+1}: {str(e)}")
return False
return True
validate_jsonl_file("/home/jupyter/openai/fine-tuning.jsonl");
I saw your script on the page: Method to Validate JSONL and JSONL+ OpenAI API Fine-Tuning Requirements - #2 by ruby_coder
For the Regex:
^\{"prompt":\s*"([^"]+)",\s*"completion":\s*"([^"]+)"\s*\}$
it normally works fine, but it doesn’t work if the prompt value or completion value includes double quote like:
{"prompt":"find all active projects ->","completion":" {\"search_object\":\"active projects\"}"}
Does the fine-tuning JSONL file doesn’t support double quote in the prompt value or the completion value?