I tried to write a shell script that would take user and assistant messages as inputs and output a .jsonl file for fine-tuning. I tried using the format validation script, but the script is retuirning a traceback. I wonder if it’s because of jq formatting the JSON. If there’s an easy way to fix the formatting, that would be good. Script is below.
#!/bin/bash
# Define output file
OUTPUT_FILE="fine_tuning_data.jsonl"
# Ensure the output file is empty before starting
> "$OUTPUT_FILE"
# Prompt user for input
echo "Enter training data (type 'exit' to finish):"
while true; do
read -p "User: " user_input
if [[ "$user_input" == "exit" ]]; then
break
fi
read -p "Assistant: " assistant_response
# Create JSONL formatted entry
json_entry=$(jq -n --arg u "$user_input" --arg a "$assistant_response" \
'{messages: [{role: "system", content: "SYSTEM_MESSAGE"}, {role: "user", content: $u}, {role: "assistant", content: $a}]}')
# Append to output file
echo "$json_entry" >> "$OUTPUT_FILE"
done
echo "Data saved to $OUTPUT_FILE"
Sample entry
{
“messages”: [
{
“role”: “system”,
“content”: “SYSTEM_MESSAGE”
},
{
“role”: “user”,
“content”: “USER_MESSAGE”
},
{
“role”: “assistant”,
“content”: “ASSISTANT_MESSAGE”
}
]
}