GPT3.5 Function calling data different than conversational response (hallucinating song title)

Hi,

I was hoping for some advice on gpt-3.5-turbo via the API using function calls. I’m working on something that recommends songs from film scores based on the user’s request.

For example, I ask “I want songs that sound like a house on fire” and I get the following data, which is MOSTLY correct, but the title is wrong:

{
    "function_name": "get_similar_tracks",
    "duration": 3.840364933013916,
    "song": {
        "song_title": "A House on Fire",
        "composer": "Hans Zimmer",
        "film": "The Dark Knight"
    },
    "logic": "The song 'A House on Fire' from the film score of 'The Dark Knight' by Hans Zimmer has a sense of urgency and intensity that captures the chaotic and destructive nature of a house on fire."
}

If I disable the function call I get the following correct answer:

“Based on your request for songs that sound like a "house on fire," I can interpret this phrase as intense, chaotic, and filled with energy. A song that matches these characteristics is "Molossus" by Hans Zimmer from the film score of "The Dark Knight" (2008). This track captures the feeling of a house on fire with its relentless and frenetic orchestral arrangement, creating a sense of urgency and chaos.”

The second response is fantastic. I seem to be having trouble with the way that I have my function call set up.

My system instruction:

“I want you to act as a song recommender. Analyze the user’s request as a phrase and determine musical characteristics from it. Please recommend a real song from a film score that sounds like those characteristics. The song must be a real song that is on a film’s score, and have been made by a composer. Double check that the song exists by that composer on that film score”

And my function:

[
  {
    "name": "get_film_score_recommendation",
    "description": "Useful when recommending a song based on the characteristics of the user's request",
    "parameters": {
      "type": "object",
      "properties": {
        "song": {
          "type": "object",
          "description": "The real song from the composer from the film score",
          "properties": {
            "song_title": {
              "type": "string",
              "description": "The title of the song"
            },
            "composer": {
              "type": "string",
              "description": "The composer of the song"
            },
            "film": {
              "type": "string",
              "description": "The movie that the song appeared on"
            }
          }
        },
        "logic": {
          "type": "string",
          "description": "A description of the reason for selecting the real song as it relates to the characteristics described by the analysis of the user's request"
        }
      },
      "required": ["song", "logic"]
    }
  }
]

I’ve set the temperature to 0 if that’s at all helpful.

Thanks!

Likely the conversational output that is generated allows the AI to reflect more on the likely token sequence to produce when it arrives at where to write a song title.

The AI has been trained even more since release to not generate some chat as a preface before it calls the function, which makes such chain-of-thought possible, but it can be done.

Interesting. Can you think of any way that I can prevent it from hallucinating the title? Or maybe even a new approach altogether?

I tried to be more descriptive in the function call object descriptions to no avail.

I accomplished this by having the reasoning be fully described in the function call output before the production of the title. Reducing all the prompting down to just logic. Renamed the function to its purpose.

Also, your nesting of objects was causing description omission.

Results:
“function_call”: {
“name”: “output_music_recommendation”,
“arguments”: “{\n"reasoning": "The description ‘a house on fire’ suggests a scene of chaos, urgency, and danger. This could be interpreted as a need for music that is intense, fast-paced, and possibly dissonant. A suitable soundtrack might be from a thriller or action movie, where such scenes are common. One such soundtrack is ‘Why So Serious?’ from ‘The Dark Knight’ by Hans Zimmer. This track is known for its intense and chaotic sound, which could be likened to a house on fire.",\n"song_title": "Why So Serious?",\n"composer": "Hans Zimmer",\n"film": "The Dark Knight"\n}”
}

Code segment for constructing API call content

user_name=“api_programming”
ai_name=“MusicFinder”
functions_are_included=True
system_prompt=“”"
Task: Recommend music selections from existing movie scores. Output only to API function.
“”".strip()
prompt = “musical description: I want songs that sound like a house on fire”
allmessages=[
{
“role”: “system”,
“name”: ai_name,
“content”: system_prompt,
},
{
“role”: “user”,
“name”: user_name,
“content”: "music description: " + prompt,
},
]

functions = [{
“name”: “output_music_recommendation”,
“description”: “Analysis of soundtrack music needed by description, and final selection of composition”,
“parameters”: {
“type”: “object”,
“properties”: {
“reasoning”: {
“type”: “string”,
“description”: “Reasoning Procedures: Examine user’s natural language phrase describing a movie scene, a music type, or a sentiment. Reasoning requires lengthy output of these separate AI analyses. - interpret - describe the extracted mood and environment portrayed by the description. - infer - describe similar movie scenes with well-recognized music soundtracks.”
},
“song_title”: {
“type”: “string”
},
“composer”: {
“type”: “string”
},
“film”: {
“type”: “string”,
“description”: “Movie where composition was used”
}
},
“required”: [“reasoning”, “song_title”]
}}]

I’m seeing great results with gpt4, but unfortunately gpt3.5 doesn’t want to comply. Did you use 3.5 in your test?

Yes, that was gpt-4. Consider the brain of gpt-3.5-turbo to have been completely broken a week ago, unable to follow instructions. with no alternative if you are needing functions. You can see how the -16k version performs, by its architecture and attention being a bit different. You can try moving duplicate instructions into user command.

Also, I made the user input labeled “music description” in two different places in my example code, you can remove that from the prompt string.

Ok, thanks so much for your help!

1 Like