Method to Validate JSONL and JSONL+ OpenAI API Fine-Tuning Requirements

Chatty translated the Ruby code above to Python, as follows. Seems plausible, but I did not test it.

Chatty (ChatGPT) Translation

  • Check Code Completely and Use with Caution - Not Human Validated :slight_smile:
import re
def validate_jsonl(fine_tune_data, validate_api=True, prompt_separator="PROMPT_SEPARATOR", completion_stop="STOPSTOP"):
    if not fine_tune_data:
        return False
    output = []
    count = 0
    validated_line = False
    regex_to_validate = re.compile('')

    if validate_api:
        regex_to_validate = re.compile(r'^\{"prompt":\s*"([^"]+)\s*{}",\s*"completion":\s*"\s([^"]+)\s*{}"\s*\}$'.format(prompt_separator, completion_stop))
    else:
        regex_to_validate = re.compile(r'^\{"prompt":\s*"([^"]+)",\s*"completion":\s*"([^"]+)"\s*\}$')

    for line in fine_tune_data.split("\r\n"):
        count += 1
        validated_line = bool(regex_to_validate.match(line))
        output.append({'count': count, 'valid': validated_line, 'line': line})

    return output

Chatty said:

Note: python use not instead of ! and True instead of true Also in python, the each method in ruby is replaced by for loop. In this example, the each method is used to iterate over the lines in the fine_tune_data variable, and is replaced by for loop in python. And in python use re.compile() instead of Regexp.new() to create a regular expression.

Hope this helps Python users.