Gpt-4o vs. gpt-4-turbo - function calling

Hi all,

Just want to know that I’m not alone.

I use gpt-4-turbo for text generation (pretty simple task). The output is a JSON object through function calling (for output stability). I tried the same prompt with gpt-4o, but it seems to ignore the instruction to use function calling. Compared to gpt-4-turbo, which uses function calling in 95% of cases, gpt-4o doesn’t use it at all. It just returns the JSON object in the content.

Anyone else experience this?

7 Likes

I’m using {“tool_choice”: “required”} to guarantee it does choose a tool (Pydantic response model).

HOWEVER, I noticed 4o does not consistently respect the JSON schema. For example. For example, if it chooses to return markdown, it will completely ignore the required json schema. GPT-4 Turbo on the other hand does this very consistently, even when returning markdown, it will return the markdown content in the required JSON field.

1 Like

same here gpt-4o does not follow JSON schema reliably.

You are not alone. However, it is very strange because yesterday I had to make a slight prompt change for our bot to start working and it worked flawlessly all night. Then woke up this morning and it wont call the correct function no matter what we do.

We are experiencing similar issues. The model is ignoring the required fields defined in the tools definition and calling functions with missing info.

I’ve also noticed that if the function returned an error, it ignores that and continues to next instruction step as if it got success as output

1 Like

Same problem for me. gpt-4o does use the tool I give it, but it uses it incorrectly. Here are the exact steps to reproduce. I give the model the following tool:

{
    "type": "function",
    "function": {
        "name": "execute_python_code",
        "description": "Executes a python program, returning the standard output",
        "type": "Object",
        "properties": {
            "code": {
                "type": "string",
                "description": "A python program with print statements"
            }
        },
        "required": "code"
    }
}

And I give the model a task that requires the tool:

"messages": [
    {
      "role": "system",
      "content": "Use the provided tool to perform calculations for the user."  
    },
    {
        "role": "user",
        "content": "What's 1.23^4?"
    }
]

gpt-4-turbo-preview uses the tool correctly (as does gpt-3.5-turbo):

"function": {
    "name": "execute_python_code",
    "arguments": "{\"code\":\"result = 1.23**4\\nprint(result)\"}"
}

However, gpt-4o fails to supply the arguments in JSON form, and instead just dumps the code:

"function": {
    "name": "execute_python_code",
    "arguments": "base = 1.23\nexponent = 4\nresult = base ** exponent\nresult"
}

This feels like a fairly run-of-the-mill use case – is there anything I’m doing wrong? If not, I’m tempted to say this is a bug with gpt-4o.

1 Like

I’m having similar issues. GPT-4o keeps trying to call a function instead of just responding in a langgraph group chat. The agent that keeps failing only has one tool (that it is using correctly) but then calling non existent tools.

I’ve tried playing with prompts all over the place to resolve this but it consistently is messing up.

FYI, I think the "type": "object" should be under "properties" that being said I don’t think this should cause GPT4Turbo and GPT4O to work differently

Nice catch, thanks! I fixed it and can confirm that the results are unchanged

If you are willing to share some examples (prompt + function definition), I’m happy to take a look on our side.

1 Like

HI @brianz-oai
I am facing same issue .The model is ignoring the required fields defined in the tools definition and calling functions with missing info.
Please suggest any modifications i need to do in function calling description

{
“name”: “getcustomername”,
“description”: “strictly execute this function once user confirms his name and surname after showing the Intrested for personal loan.”,
“parameters”: {
“type”: “object”,
“properties”: {
“customername”: {
“type”: “string”,
“description”: “this is the first name provided by the customer”
},
“customersurname”: {
“type”: “string”,
“description”: “surname provided by the user”
}
},
“required”: [“customername”]
}
},

managed to get it working using

            model="gpt-4o-2024-05-13",

My understanding is that “gpt-4o-2024-05-13” is the date-specific version currently used by the “gpt-4o” model specification as well (so they are the same model). I did try it and saw one working run, followed by more broken runs.

@brianz-oai Here is a LangSmith trace of a run that clearly fails to pick up on the given schema. I’m not allowed to post a link, but the trace is available at the langsmith website (smith langchain com) under /public/40e38079-fc4a-4da6-b6cd-e36b2869380d/r

Here’s the schema of the function tool provided:

{
  "name": "Answer",
  "description": "Response to the question, containing 3 keys: answer, reflection, search_queries",
  "parameters": {
    "type": "object",
    "properties": {
      "answer": {
        "description": "~250 word detailed answer to the question",
        "type": "string"
      },
      "reflection": {
        "description": "Your reflection/critiques of your answer",
        "allOf": [
          {
            "title": "Reflection",
            "type": "object",
            "properties": {
              "missing": {
                "title": "Missing",
                "description": "Critique of what is missing",
                "type": "string"
              },
              "superfluous": {
                "title": "Superfluous",
                "description": "Critique of what is superfluous",
                "type": "string"
              }
            },
            "required": [
              "missing",
              "superfluous"
            ]
          }
        ]
      },
      "search_queries": {
        "description": "The final top level key containing 1-3 search queries to use for researching improvements to address the critiques of your answer",
        "type": "array",
        "items": {
          "type": "string"
        }
      }
    },
    "required": [
      "answer",
      "reflection",
      "search_queries"
    ]
  }
}

And here are the messages:

  [
    {
        "content": "You are an expert online researcher\n      Current time: 2024-06-11T09:08:54.991070\n\n      Here are your instructions:\n      1. Provide a detailed ~250 word answer.\n      2. Reflect and Critique your answer to step 1.  Be severe to maximise improvement.\n      3. Recommend search queries to research information that will assist in improving your answer.\n      ",
        "type": "system"
      },{
        "content": "Who is the most popular musician in the United States right now?",
        "type": "human"
      },{
        "content": "Answer the user's question above using the required format.",
        "type": "system"
      }
  ]

gpt-4o is nesting “search_queries” under “reflection” key, when it’s supposed to be a top-level key. I realize this is kind of a nuanced schema, but gpt-4o pretty consistently produces the same error (90% of the time), regardless of how I try to phrase the field descriptions, etc. And gpt-4-turbo consistently produces a correct result.

3 Likes

I’m also having issues with GPT-4o calling functions.

Sometimes it will actually just send json as an assistant message instead of a function call, which is a problem because that is shown to the user.

Another issue though, is that it seems to ignore the order of parameters. Gpt-4-turbo provided the parameters in the specified order allowing me to leverage chain of thought in the function itself, whereas GPT-4o ignores the order and provides the chain of thought last (making it useless). I even tried adding a bunch of prompts & hints telling it to provide the chain of thought (“rationale”) as the first argument but they are ignored.

  {
      "name": "run_query",
      "description": "run an sql query to get data for the user. Do not provide a step by step explanation of the rationale to the user, save that for the rationale argument of this function. Always provide rationale as the first parameter for this function before providing the query.",
      "parameters": {
          "type": "object",
          "properties": {
              "rationale": {
                  "type": "string",
                  "description": "a step by step explanation of how to write the query, given the metadata (including any descriptions/notes), the example queries (if any are provided), and the users request. Always provide this as the first parameter for this function before the query.",
              },
              "query": {
                  "type": "string",
                  "description": f"valid {self.database.db_type} SQL query to be executed on the users database. When using subqueries, make sure the variables you query from the subquery are actually present in the subquery result."
                                 f"{self.metric_creation_message if self.chat.isMetricCreationChat else ''}"
                                 f"{' NEVER specify the BigQuery project id in SQL code, the project id has already been set. ' if self.database.db_type == 'BigQuery' else ''}"
                                 f" Be sure to take into account what is actually possible in {self.database.db_type}, especially given the types"
                                 f" Avoid ambiguous column errors when joining tables with common column names."
                                 f" For quarterly/weekly/monthly queries, put the date info in a single date column with the full start date of the period and ensure it is a date type."
                                 f" Use sufficient line breaks to avoid long lines, and always sufficient indentation to make the query readable."
                                 f" Try to get readable results, ex. when querying for products avoid returning just their ids, try to include their names if available."
                                 f''' {'Use quotation marks ("TableName"."ColumnName") in PostgreSQL queries for any schemas, tables, or columns with capital letters. Do not use quotation marks for lower case identifiers.' if self.database.db_type == 'PostgreSQL' and False else ''}'''
              },
              "name": {
                  "type": "string",
                  "description": "very short name for describing the query",
              }
          },
          "required": ["rationale", "query", "name"],
      },
  },

yep.

cause its trained on json, not your API.

You have to set a pattern.

I don’t think that’s an acceptable explanation, since it was working with GPT-4-turbo as OP said.

I am having the same issue, but it is happening even more (and much worse) with fine-tuned models. I am wondering if this (pure ignorant speculation on my part) could have something to do with the weight the model is giving to either the conversation history or the fine-tuning data. I even tried including examples of function call responses in JSONL formatting to train one of my fine-tuned models and it still failed miserably.

1 Like

I am having similar issues. gpt-3.5 works fine, however, changing the model to gtp-4o-mini fails. In my case gpt-4o works, but need to be cost-wise. Do you have experiences to share, and possible solutions?