I want to be able to check a string of text for profanity (after running it through moderation checkpoint) and i want my fine-tuned gpt-3.5-turbo model to respond with “true” or “false”. I’ve been trying to accomplish this by following the docs instructions on fine-tuning w/function calling but all I get are errors when validating my JSONL via the python scripts (but I can upload the file without a problem via the fine-tuning UI). Here’s what the docs recommend:
{
"messages": [
{"role": "user", "content": "What is the weather in San Francisco?"},
{"role": "assistant", "function_call": {"name": "get_current_weather", "arguments": "{\"location\": \"San Francisco, USA\", \"format\": \"celcius\"}"}
],
"functions": [{
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "The city and country, eg. San Francisco, USA"},
"format": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location", "format"]
}
}]
}
and here is how I generate my training data (my JS script):
const samplePhrases = [
{ phrase: 'I love playing soccer', isProfane: false },
etc...
]
const trainingData = samplePhrases.map(({ phrase, isProfane }) => ({
messages: [
{
role: 'user',
content: JSON.stringify(phrase),
},
{
role: 'assistant',
function_call: {
name: 'isProfane',
arguments: JSON.stringify({ "phrase": phrase }),
},
},
{
role: 'function',
name: 'isProfane',
content: String(isProfane),
},
{
role: 'assistant',
content: String(isProfane),
},
],
functions: [
{
name: 'isProfane',
description: 'Check the string for profanity',
parameters: {
type: 'object',
properties: {
phrase: {
type: 'string',
description: 'The string to check for profanity',
},
},
required: ['phrase'],
},
},
],
}));
After I convert the JSON output to JSONL I get a bunch of lines of JSONL (I validated that it’s valid) that look like this:
{"messages":[{"role":"user","content":"\"I love playing soccer\""},{"role":"assistant","function_call":{"name":"isProfane","arguments":"{\"phrase\":\"I love playing soccer\"}"}},{"role":"function","name":"isProfane","content":"false"},{"role":"assistant","content":"false"}],"functions":[{"name":"isProfane","description":"Check the string for profanity","parameters":{"type":"object","properties":{"phrase":{"type":"string","description":"The string to check for profanity"}},"required":["phrase"]}}]}
These always fail validation via the python scripts with the errors message_missing_key and missing_content. This implies that my messages don’t all have “role” and “content” keys but OpenAI’s example shows this:
{
role: 'assistant',
function_call: {
name: 'isProfane',
arguments: {
"phrase": JSON.stringify(phrase),
},
},
},
which has no content key…so what’s the real issue?
To try to get a more descriptive error I tried uploading my JSONL file to the UI and the file uploads fine.
What’s wrong with how I’m fine tuning my model? Maybe I shouldn’t be using function calling at all and just expect the model to properly return “true” or “false.” But I’m hoping to get it to work this way first…