Function calls in 3.5-turbo-0613 for compound text processing produce unreliable results

I have been comparing gpt-3.5-turbo-0613 and gpt-4-0613 function call syntax against the non function call sysntax for compound text processing tasks (e.g. do many things to input text in one api call) that output json. I was hoping to switch to the function call syntax using gpt-3.5-turbo-0613 due to it’s lower cost and speed, but so far the function call API call produces unreliable results.

Specifically, I pass in a voice transcript (from Whisper) and ask the model to perform 4 tasks and return the results in a json object:

  1. Generate a title based on the contents
  2. Split the transcript into paragraphs
  3. Extract one-word categories from the transcript
  4. Extract action items if the transcript contains todos/action items
  • What happens fairly reliably with the function call syntax is the model outputs the json object, but with empty values for all fields except for the transcript. It returns the input transcript as is without breaking it into paragraphs. The same prompt without the function call syntax produces the expected results. Note: I’m not using the system prompt for 3.5.
  • In these requests, temperature=0, function_call="auto"

Here’s the functions array I’m passing:

"functions": [
    {
      "name": "processEntryData",
      "parameters": {
        "type": "object",
        "properties": {
          "transcript": {"type": "string", "description": "The transcript text separated into paragraphs by two `\n\n` newline"},
          "title": {"type": "string", "description": "The generated title from the transcript contents"},
          "action_items": {"type": "array", "items": {"type": "string"}},
          "categories": {"type": "array", "items": {"type": "string"}, "description": "Categories describing the note"}
        },
        "required": ["transcript", "title", "categories"]
      }
    }
  ],
  • In contrast, gpt-4-0613, the function call syntax works as expected, producing the desired results reliably, in a json object.
  • Similarly, I get the desired results when removing the function call syntax with gpt-3.5-turbo

I imagine these sorts of compound tasks are maybe not what the model is trained to do, compared to a query as in the examples like “what’s the weather in boston?”.

I wanted to share this experience to serve as a guide to others, and also curious if anyone else is experiencing similar results.

1 Like

Thank you for your insights.

GPT-3 is very sensitive with function calling.

First, the function themself needs a description field (if there are more than one functions or if it is not clear from the context, which function to call)
Second, the other parameters lacks concise descriptions too, which will GPT-3 deny to fill then.
E.g. The name of the function is not coherent in context with the description of the parameters. In some description you name it “transcript text”, then “transcript content” and in the other “the note”. This confuses GPT-3.

Also for GPT-3 it is important to use a low temperature.

Hope this helps a little bit.

1 Like

Thanks for your thoughts here. For clarity I’ve included the entire request body (transcript excluded) below. The results are the same as described above.

  • I’ve fixed inconsistencies in how the transcript is referred to. I stick to transcript text
  • I added a function description
  • I’ve included the redacted responses, which shows the empty title and categories fields

Request body

{
  "model": "gpt-3.5-turbo-0613",
  "temperature": 0,
  "function_call": "auto",
  "functions": [
    {
      "name": "saveVoiceTranscriptDetails",
      "description": "Takes voice transcript text, title, categories, and action items and saves them to the database",
      "parameters": {
        "type": "object",
        "properties": {
          "transcript": {"type": "string", "description": "The transcript text separated into paragraphs by two `\n\n` newline"},
          "title": {"type": "string", "description": "The title generated from the transcript text"},
          "action_items": {"type": "array", "items": {"type": "string"}},
          "categories": {"type": "array", "items": {"type": "string"}, "description": "Categories describing the transcript"}
        },
        "required": ["transcript", "title", "categories"]
      }
    }
  ],
  "messages": [
    {
      "role": "user",
      "content":
        "You are a text processing and classification AI assistant. 
Perform these 4 tasks to process the user voice transcript included below.
Tasks:\n\n
1. Generate a title from the transcript's raw text.\n
2. Split the raw transcript text into logical paragraphs based on it's contents. Separate paragraphs with two `\\n` newline characters.\n
3. Generate up to three one-word descriptive categories based on the transcript text.\n.
4. If there are tasks to be remembered in the transcript, extract those as action items.\n
You MUST ALWAYS return the results as a json object.\n\n
Transcript:[TRANSCRIPT]"
    }
  ]
}

Response

{
  "id": "chatcmpl-7T5KdICDvLCNa8O4VjuTZ6HXxDyC6",
  "object": "chat.completion",
  "created": 1687166363,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "function_call": {
          "name": "saveTranscriptMetadata",
          "arguments": "{\n  \"transcript\": \"[TRANSCRIPT]\",\n  \"title\": \"\",\n  \"categories\": []\n}"
        }
      },
      "finish_reason": "function_call"
    }
  ],
  "usage": {
    "prompt_tokens": 600,
    "completion_tokens": 414,
    "total_tokens": 1014
  }
}
1 Like

This looks very consistent now.
Please try out to put your commands/instructions into the system- role and use an example of a transcript in the user- role message.
Also reorder the numbered instructions sorted by the same order they appeare in the parameters of the properties blob.

Ah, interesting, you’re right, 3.5-turbo-0613 seems to respond to system messages where the prior version of the model did not! There’s still some slightly odd behavior if I specify "function_call": auto, where it returns the payload in both the function arguments key and the normal contents key, but this looks promising!

If you want GPT to only call the function then say so like “Only call the function and avoid to respond to the user.”, etc.

Got it – awesome, thanks so much for the help. Some of this info should go in the docs/guides :pray:

1 Like