GPT-4o cannot properly call custom functions more than half the time

When GPT-4o tries to call a custom function, it doesn’t actually call any function. Instead, it prints the function call in the normal assistant response.

Sometimes, it works fine. Most times, it prints the function call and any arguments that go with it, right in the assistant response. It makes this model unusable when you need custom functions to work. Not to mention confuses the hell out of users.

What can I do? It doesn’t seem to matter how I word the function description. GPT-4-turbo, GPT-4, and GPT-3.5-turbo all work fine with customer functions in my code.

1 Like

As you setting the function_call parameter to force it to use a given function for chat completions, or the equivalent tool_choice param for thread runs?

1 Like

Yes, using the function call parameter.

I should mention that I am using the Assistants API v2

Having a similar issue.

My assistants keep saying “Okay, I’m going to [use the function] now” and then they don’t. GPT-4-Turbo didn’t seem to have this problem as often.

FYI I’m not having this issue with Chat Completions.

I’m having the exact opposite issue. GPT-4o keeps trying to call a non existent function instead of just responding in a langgraph group chat. The agent that keeps failing only has one tool (that it is using correctly) but then calling non existent tool.

I’ve tried playing with prompts all over the place to resolve this but it consistently is messing up.

I’m using chat completion and it’s correctly returning a function call, but I’m definitely seeing worse performance than gpt-4. It’s generally making worse decisions, not obeying enums as well as previously, and often calling only one function where multiple are needed.

i am using gpt assistant and facing exact same issue , did you find that what wrong with it ?

whats the solution of this ?

I agree and find that function calling in GPt-4o is unusable when compared to GPT-4 Turbo. In some cases I am seeing the same function being called 2 or 3 times with parameters being correct for 1 and completely made up for the others. Also seeing function calls returned in the assistant response. When instructed in the system message to only call a single function, multiple functions are often called.

1 Like

Same here. Instead of returning with the correct function calls, gpt-4o sometimes returns a content string in the form of

functions.${function_name}(
  ${args}
)
${actual content}

, as if it is calling a JS function.
My (very temporary) solution is to manually parse this string to retrieve function name and arguments. Here’s the code (node.js + typescript + langchain):

import type { AIMessage } from '@langchain/core/messages';
import jsonc from 'jsonc-parser';

/**
 * Maunally parse and rectify a faulty tool calling response from gpt-4o. 
 * @param {AIMessage} msg 
 * @returns {void}
 */
export function patchFautyFunctionCall(msg: AIMessage): void {
  if (!msg.content || typeof msg.content !== 'string') {
    return;
  }
  const matchResult = msg.content.match(/functions\.(.*?)[\(\n]/);
  if (!matchResult) {
    return;
  }

  // The string that comes after functions.${function_name}(
  // If the argument is an object, gpt-4o sometimes omits double quotes around attribute names, 
  // making it an invalid JSON string. quotifyJSONString adds the missing quotation marks. 
  const jsonString = quotifyJSONString(msg.content.substring((matchResult.index ?? 0) + matchResult[0].length));
  const [jsonObj, prefixLen] = parseJSONPrefix(jsonString);
  if (prefixLen === 0) {
    return;
  }
  msg.content = jsonString.substring(prefixLen);
  if (msg.tool_calls === undefined) {
    msg.tool_calls = [];
  }
  msg.tool_calls.push({
    name: matchResult[1],
    args: jsonObj,
  });
}

function quotifyJSONString(unquotedJson: string): string {
  const attributePattern = /([{,]\s*)([a-zA-Z_][a-zA-Z0-9_]*)(\s*:)/g;

  // Replace unquoted attribute names with quoted ones
  return unquotedJson.replace(attributePattern, '$1"$2"$3');
}

/**
 * Try to parse the prefix of a JSON string into an object.
 * @param {string} str A string that might have a valid JSON prefix.
 * @returns {[any, number]} [The parsed object, size of the valid JSON prefix]
 */
function parseJSONPrefix(str: string): [any, number] {
  const errors: jsonc.ParseError[] = [];
  const obj = jsonc.parse(str, errors);

  if (errors.length === 0) {
    return [obj, str.length];
  }
  if (errors[0].offset === 0) {
    // No valid prefix
    return [undefined, 0];
  }

  return [obj, errors[0].offset];
}

Hi @cbarber713, OAI staff here, and sorry for the late reply. I’d like to get more details on your use case. It sounds like you were using:

  • assistants API v2
  • with tools and tool_choice param

My questions is: what value did you pass to tool_choice? Is it auto, required or a specific function like {"type": "function", "function": "your_func_name"}

Any information that can help me reproduce this bug is highly appreciated.

brian

@turbolucius, @voidptr_t, @mahnoorrana.dev
if any of you can provide details that would help me reproduce this bug, it would help me figure out what’s going on faster. :pray:

Sure, you could try to reproduce it using this repo.

1 Like

@voidptr_t Thanks for providing the example. I did some digging today and my early hypothesis is that gpt-4o likely requires more explicit and accurate instructions than 4-turbo for function calling.

Since the main issue with your example is that the model chose to output a user message (with the function call in javascript syntax) instead of returning tool_calls, I changed

If you feel they are expecting a response from you, output your response.

to

If you feel they are expecting a response from you, output your response by using a tool.

in the system prompt.

I ran your example 1000 times in a script and noticed that the issue went away (i.e. gpt-4o calling the function similar amount of time as gpt-4-turbo).

Feel free to give it a try and let me know how it works for you. We will look deeper into this and will share more findings or good practices soon.

3 Likes

@cbarber713 Please see my above comment. I’d suggest you play with your system prompt, trying to be as explicit and specific as possible on when you expect the model to use the tool. I’m curious to hear if that helps with your use case.

If you are willing to share some concrete examples of yours, I’m happy to take a look on my side too.

2 Likes

Im also getting this issue with my custom GPT’s

I’ve tried a lot of different prompting tricks… any help much appreciated.

try this model,

            model="gpt-4o-2024-05-13",

Sorry for the late reply. I went out of town and got sick and was stuck. Bit of a nightmare but anyways…

I am using auto for tool_choice as I need the AI to decide on it’s own when to call the function. For this same reason I cannot really tell it to use a tool to respond every time in the system prompt. However, I did try this just as a test but the issue still remains. The AI produces a text response with the function call syntax shown. It seems that no amount of manipulating the system prompt will fix this.

One example is that we have a contact form that the AI can show to the user via a custom function. The form should only be shown when appropriate based on the conversation. I’ve played with the system prompt and the instructions in the custom function. But nothing seems to help. gpt-4-turbo does not have this issue at all.

Did this solve it for you?