Validation errors with function calling + fine-tuning

I want to be able to check a string of text for profanity (after running it through moderation checkpoint) and i want my fine-tuned gpt-3.5-turbo model to respond with “true” or “false”. I’ve been trying to accomplish this by following the docs instructions on fine-tuning w/function calling but all I get are errors when validating my JSONL via the python scripts (but I can upload the file without a problem via the fine-tuning UI). Here’s what the docs recommend:

{
    "messages": [
        {"role": "user", "content": "What is the weather in San Francisco?"},
        {"role": "assistant", "function_call": {"name": "get_current_weather", "arguments": "{\"location\": \"San Francisco, USA\", \"format\": \"celcius\"}"}
    ],
    "functions": [{
        "name": "get_current_weather",
        "description": "Get the current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "The city and country, eg. San Francisco, USA"},
                "format": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location", "format"]
        }
    }]
}

and here is how I generate my training data (my JS script):

const samplePhrases = [
  { phrase: 'I love playing soccer', isProfane: false },
etc...
]

const trainingData = samplePhrases.map(({ phrase, isProfane }) => ({
  messages: [
    {
      role: 'user',
      content: JSON.stringify(phrase),
    },
    {
      role: 'assistant',
      function_call: {
        name: 'isProfane',
        arguments: JSON.stringify({ "phrase": phrase }),
      },
    },
    {
      role: 'function',
      name: 'isProfane',
      content: String(isProfane),
    },
    {
      role: 'assistant',
      content: String(isProfane),
    },
  ],
  functions: [
    {
      name: 'isProfane',
      description: 'Check the string for profanity',
      parameters: {
        type: 'object',
        properties: {
          phrase: {
            type: 'string',
            description: 'The string to check for profanity',
          },
        },
        required: ['phrase'],
      },
    },
  ],
}));

After I convert the JSON output to JSONL I get a bunch of lines of JSONL (I validated that it’s valid) that look like this:

{"messages":[{"role":"user","content":"\"I love playing soccer\""},{"role":"assistant","function_call":{"name":"isProfane","arguments":"{\"phrase\":\"I love playing soccer\"}"}},{"role":"function","name":"isProfane","content":"false"},{"role":"assistant","content":"false"}],"functions":[{"name":"isProfane","description":"Check the string for profanity","parameters":{"type":"object","properties":{"phrase":{"type":"string","description":"The string to check for profanity"}},"required":["phrase"]}}]}

These always fail validation via the python scripts with the errors message_missing_key and missing_content. This implies that my messages don’t all have “role” and “content” keys but OpenAI’s example shows this:

{
      role: 'assistant',
      function_call: {
        name: 'isProfane',
        arguments: {
          "phrase": JSON.stringify(phrase),
        },
      },
    },

which has no content key…so what’s the real issue?

To try to get a more descriptive error I tried uploading my JSONL file to the UI and the file uploads fine.

What’s wrong with how I’m fine tuning my model? Maybe I shouldn’t be using function calling at all and just expect the model to properly return “true” or “false.” But I’m hoping to get it to work this way first…

It seems like you are going about this project all wrong. Do you want an AI that can call a function? Because that’s what you are doing, showing the AI either it calling a function, or the AI reporting the value of the function.

Tell me how in use you would ever supply user input alone and expect the AI to ignore all its fine tuning and only call a “isProfane” function. Is this AI calling the function, or is it performing the function?

You also don’t have a unique system message. You also don’t have a null content for assistant in the case where it emits a function call. Either may be required.

Ahh I think I’ve been approaching it the wrong way. In reality I just want the fine-tuned model itself to produce structured output (return “true” or “false”) but it doesn’t need to call any isProfane function. So in that case I don’t need the function calling and I should add a system message.

Thanks!