GPT-4o cannot properly call custom functions more than half the time

cbarber713 · May 15, 2024, 9:58pm

When GPT-4o tries to call a custom function, it doesn’t actually call any function. Instead, it prints the function call in the normal assistant response.

Sometimes, it works fine. Most times, it prints the function call and any arguments that go with it, right in the assistant response. It makes this model unusable when you need custom functions to work. Not to mention confuses the hell out of users.

What can I do? It doesn’t seem to matter how I word the function description. GPT-4-turbo, GPT-4, and GPT-3.5-turbo all work fine with customer functions in my code.

gregmatthews555 · May 15, 2024, 11:36pm

As you setting the function_call parameter to force it to use a given function for chat completions, or the equivalent tool_choice param for thread runs?

cbarber713 · May 16, 2024, 1:01pm

Yes, using the function call parameter.

I should mention that I am using the Assistants API v2

turbolucius · May 18, 2024, 3:04pm

Having a similar issue.

My assistants keep saying “Okay, I’m going to [use the function] now” and then they don’t. GPT-4-Turbo didn’t seem to have this problem as often.

merefield · May 20, 2024, 7:19am

FYI I’m not having this issue with Chat Completions.

solo · May 21, 2024, 12:52am

I’m having the exact opposite issue. GPT-4o keeps trying to call a non existent function instead of just responding in a langgraph group chat. The agent that keeps failing only has one tool (that it is using correctly) but then calling non existent tool.

I’ve tried playing with prompts all over the place to resolve this but it consistently is messing up.

burnett.nr · May 23, 2024, 3:58pm

I’m using chat completion and it’s correctly returning a function call, but I’m definitely seeing worse performance than gpt-4. It’s generally making worse decisions, not obeying enums as well as previously, and often calling only one function where multiple are needed.

mahnoorrana.dev · May 23, 2024, 8:59pm

i am using gpt assistant and facing exact same issue , did you find that what wrong with it ?

mahnoorrana.dev · May 23, 2024, 9:00pm

whats the solution of this ?

corytomlinson · May 26, 2024, 3:38pm

I agree and find that function calling in GPt-4o is unusable when compared to GPT-4 Turbo. In some cases I am seeing the same function being called 2 or 3 times with parameters being correct for 1 and completely made up for the others. Also seeing function calls returned in the assistant response. When instructed in the system message to only call a single function, multiple functions are often called.

voidptr_t · May 27, 2024, 3:22pm

Same here. Instead of returning with the correct function calls, gpt-4o sometimes returns a content string in the form of

functions.${function_name}(
  ${args}
)
${actual content}

, as if it is calling a JS function.
My (very temporary) solution is to manually parse this string to retrieve function name and arguments. Here’s the code (node.js + typescript + langchain):

import type { AIMessage } from '@langchain/core/messages';
import jsonc from 'jsonc-parser';

/**
 * Maunally parse and rectify a faulty tool calling response from gpt-4o. 
 * @param {AIMessage} msg 
 * @returns {void}
 */
export function patchFautyFunctionCall(msg: AIMessage): void {
  if (!msg.content || typeof msg.content !== 'string') {
    return;
  }
  const matchResult = msg.content.match(/functions\.(.*?)[\(\n]/);
  if (!matchResult) {
    return;
  }

  // The string that comes after functions.${function_name}(
  // If the argument is an object, gpt-4o sometimes omits double quotes around attribute names, 
  // making it an invalid JSON string. quotifyJSONString adds the missing quotation marks. 
  const jsonString = quotifyJSONString(msg.content.substring((matchResult.index ?? 0) + matchResult[0].length));
  const [jsonObj, prefixLen] = parseJSONPrefix(jsonString);
  if (prefixLen === 0) {
    return;
  }
  msg.content = jsonString.substring(prefixLen);
  if (msg.tool_calls === undefined) {
    msg.tool_calls = [];
  }
  msg.tool_calls.push({
    name: matchResult[1],
    args: jsonObj,
  });
}

function quotifyJSONString(unquotedJson: string): string {
  const attributePattern = /([{,]\s*)([a-zA-Z_][a-zA-Z0-9_]*)(\s*:)/g;

  // Replace unquoted attribute names with quoted ones
  return unquotedJson.replace(attributePattern, '$1"$2"$3');
}

/**
 * Try to parse the prefix of a JSON string into an object.
 * @param {string} str A string that might have a valid JSON prefix.
 * @returns {[any, number]} [The parsed object, size of the valid JSON prefix]
 */
function parseJSONPrefix(str: string): [any, number] {
  const errors: jsonc.ParseError[] = [];
  const obj = jsonc.parse(str, errors);

  if (errors.length === 0) {
    return [obj, str.length];
  }
  if (errors[0].offset === 0) {
    // No valid prefix
    return [undefined, 0];
  }

  return [obj, errors[0].offset];
}

brianz-oai · May 29, 2024, 5:39am

Hi @cbarber713, OAI staff here, and sorry for the late reply. I’d like to get more details on your use case. It sounds like you were using:

assistants API v2
with tools and tool_choice param

My questions is: what value did you pass to tool_choice? Is it auto, required or a specific function like {"type": "function", "function": "your_func_name"}

Any information that can help me reproduce this bug is highly appreciated.

brian

brianz-oai · May 29, 2024, 5:44am

@turbolucius, @voidptr_t, @mahnoorrana.dev
if any of you can provide details that would help me reproduce this bug, it would help me figure out what’s going on faster.

voidptr_t · May 29, 2024, 10:26am

Sure, you could try to reproduce it using this repo.

brianz-oai · May 30, 2024, 6:08am

@voidptr_t Thanks for providing the example. I did some digging today and my early hypothesis is that gpt-4o likely requires more explicit and accurate instructions than 4-turbo for function calling.

Since the main issue with your example is that the model chose to output a user message (with the function call in javascript syntax) instead of returning tool_calls, I changed

If you feel they are expecting a response from you, output your response.

to

If you feel they are expecting a response from you, output your response by using a tool.

in the system prompt.

I ran your example 1000 times in a script and noticed that the issue went away (i.e. gpt-4o calling the function similar amount of time as gpt-4-turbo).

Feel free to give it a try and let me know how it works for you. We will look deeper into this and will share more findings or good practices soon.

brianz-oai · May 30, 2024, 6:13am

@cbarber713 Please see my above comment. I’d suggest you play with your system prompt, trying to be as explicit and specific as possible on when you expect the model to use the tool. I’m curious to hear if that helps with your use case.

If you are willing to share some concrete examples of yours, I’m happy to take a look on my side too.

baengel98 · June 8, 2024, 9:49pm

Im also getting this issue with my custom GPT’s

I’ve tried a lot of different prompting tricks… any help much appreciated.

QLion360 · June 11, 2024, 7:38am

try this model,

            model="gpt-4o-2024-05-13",

cbarber713 · July 29, 2024, 3:25am

Sorry for the late reply. I went out of town and got sick and was stuck. Bit of a nightmare but anyways…

I am using auto for tool_choice as I need the AI to decide on it’s own when to call the function. For this same reason I cannot really tell it to use a tool to respond every time in the system prompt. However, I did try this just as a test but the issue still remains. The AI produces a text response with the function call syntax shown. It seems that no amount of manipulating the system prompt will fix this.

One example is that we have a contact form that the AI can show to the user via a custom function. The form should only be shown when appropriate based on the conversation. I’ve played with the system prompt and the instructions in the custom function. But nothing seems to help. gpt-4-turbo does not have this issue at all.

cbarber713 · July 29, 2024, 3:26am

Did this solve it for you?

Topic		Replies	Views
Gpt-4o vs. gpt-4-turbo - function calling Bugs function-calling , gpt-4-turbo , gpt-4o	17	5469	September 23, 2024
New models are incapable of proper function calling Feedback	22	5465	July 17, 2024
Any issue with gpt-4o model recently relating to tool calls? Bugs gpt-4	7	1301	September 3, 2024
O3-mini in Assistants not following through with function call API	15	2279	February 28, 2025
Gpt-4-turbo is not respecting "required", while gpt-4-turbo-preview is API	14	1597	April 17, 2024

GPT-4o cannot properly call custom functions more than half the time

Related topics