Function calls in 3.5-turbo-0613 for compound text processing produce unreliable results

ajsharp · June 19, 2023, 8:12am

I have been comparing gpt-3.5-turbo-0613 and gpt-4-0613 function call syntax against the non function call sysntax for compound text processing tasks (e.g. do many things to input text in one api call) that output json. I was hoping to switch to the function call syntax using gpt-3.5-turbo-0613 due to it’s lower cost and speed, but so far the function call API call produces unreliable results.

Specifically, I pass in a voice transcript (from Whisper) and ask the model to perform 4 tasks and return the results in a json object:

Generate a title based on the contents
Split the transcript into paragraphs
Extract one-word categories from the transcript
Extract action items if the transcript contains todos/action items

What happens fairly reliably with the function call syntax is the model outputs the json object, but with empty values for all fields except for the transcript. It returns the input transcript as is without breaking it into paragraphs. The same prompt without the function call syntax produces the expected results. Note: I’m not using the system prompt for 3.5.
In these requests, temperature=0, function_call="auto"

Here’s the functions array I’m passing:

"functions": [
    {
      "name": "processEntryData",
      "parameters": {
        "type": "object",
        "properties": {
          "transcript": {"type": "string", "description": "The transcript text separated into paragraphs by two `\n\n` newline"},
          "title": {"type": "string", "description": "The generated title from the transcript contents"},
          "action_items": {"type": "array", "items": {"type": "string"}},
          "categories": {"type": "array", "items": {"type": "string"}, "description": "Categories describing the note"}
        },
        "required": ["transcript", "title", "categories"]
      }
    }
  ],

In contrast, gpt-4-0613, the function call syntax works as expected, producing the desired results reliably, in a json object.
Similarly, I get the desired results when removing the function call syntax with gpt-3.5-turbo

I imagine these sorts of compound tasks are maybe not what the model is trained to do, compared to a query as in the examples like “what’s the weather in boston?”.

I wanted to share this experience to serve as a guide to others, and also curious if anyone else is experiencing similar results.

PriNova · June 19, 2023, 8:36am

Thank you for your insights.

GPT-3 is very sensitive with function calling.

First, the function themself needs a description field (if there are more than one functions or if it is not clear from the context, which function to call)
Second, the other parameters lacks concise descriptions too, which will GPT-3 deny to fill then.
E.g. The name of the function is not coherent in context with the description of the parameters. In some description you name it “transcript text”, then “transcript content” and in the other “the note”. This confuses GPT-3.

Also for GPT-3 it is important to use a low temperature.

Hope this helps a little bit.

ajsharp · June 19, 2023, 9:26am

Thanks for your thoughts here. For clarity I’ve included the entire request body (transcript excluded) below. The results are the same as described above.

I’ve fixed inconsistencies in how the transcript is referred to. I stick to transcript text
I added a function description
I’ve included the redacted responses, which shows the empty title and categories fields

Request body

{
  "model": "gpt-3.5-turbo-0613",
  "temperature": 0,
  "function_call": "auto",
  "functions": [
    {
      "name": "saveVoiceTranscriptDetails",
      "description": "Takes voice transcript text, title, categories, and action items and saves them to the database",
      "parameters": {
        "type": "object",
        "properties": {
          "transcript": {"type": "string", "description": "The transcript text separated into paragraphs by two `\n\n` newline"},
          "title": {"type": "string", "description": "The title generated from the transcript text"},
          "action_items": {"type": "array", "items": {"type": "string"}},
          "categories": {"type": "array", "items": {"type": "string"}, "description": "Categories describing the transcript"}
        },
        "required": ["transcript", "title", "categories"]
      }
    }
  ],
  "messages": [
    {
      "role": "user",
      "content":
        "You are a text processing and classification AI assistant. 
Perform these 4 tasks to process the user voice transcript included below.
Tasks:\n\n
1. Generate a title from the transcript's raw text.\n
2. Split the raw transcript text into logical paragraphs based on it's contents. Separate paragraphs with two `\\n` newline characters.\n
3. Generate up to three one-word descriptive categories based on the transcript text.\n.
4. If there are tasks to be remembered in the transcript, extract those as action items.\n
You MUST ALWAYS return the results as a json object.\n\n
Transcript:[TRANSCRIPT]"
    }
  ]
}

Response

{
  "id": "chatcmpl-7T5KdICDvLCNa8O4VjuTZ6HXxDyC6",
  "object": "chat.completion",
  "created": 1687166363,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "function_call": {
          "name": "saveTranscriptMetadata",
          "arguments": "{\n  \"transcript\": \"[TRANSCRIPT]\",\n  \"title\": \"\",\n  \"categories\": []\n}"
        }
      },
      "finish_reason": "function_call"
    }
  ],
  "usage": {
    "prompt_tokens": 600,
    "completion_tokens": 414,
    "total_tokens": 1014
  }
}

PriNova · June 19, 2023, 9:33am

This looks very consistent now.
Please try out to put your commands/instructions into the system- role and use an example of a transcript in the user- role message.
Also reorder the numbered instructions sorted by the same order they appeare in the parameters of the properties blob.

ajsharp · June 19, 2023, 5:19pm

Ah, interesting, you’re right, 3.5-turbo-0613 seems to respond to system messages where the prior version of the model did not! There’s still some slightly odd behavior if I specify "function_call": auto, where it returns the payload in both the function arguments key and the normal contents key, but this looks promising!

PriNova · June 19, 2023, 5:22pm

If you want GPT to only call the function then say so like “Only call the function and avoid to respond to the user.”, etc.

ajsharp · June 19, 2023, 5:24pm

Got it – awesome, thanks so much for the help. Some of this info should go in the docs/guides

Topic		Replies	Views
My most important function is being called only very rarely API gpt-35-turbo , prompt , functions	7	2294	December 19, 2023
Function call returns invalid JSON format Bugs gpt-35-turbo , bug , functions , function-calling , gpt-35-turbo-1106	7	1510	April 18, 2024
New models are incapable of proper function calling Feedback	22	5516	July 17, 2024
Chat completion is explaining the functions instead of actually calling them Prompting functions	11	5957	July 17, 2023
Prompting the Functions for Function Calling Prompting api	9	7634	July 11, 2023

Function calls in 3.5-turbo-0613 for compound text processing produce unreliable results

Related topics