GPT-4o with function calling not reliable from Assistants-API

Hey there,

We created a SaaS application that leverages OpenAI GPT-4o to open tickets on carriers’ websites for 3PL companies.

Recently, the model started promising instead of acting, whereas it had been working without issue for almost a year. (“We will contact …” instead of Function Calling + “We have contacted …”)

We have lots of instructions, and we repeat the part where we ask the model to perform tool calling before responding to customers. But the rate at which it decides not to follow the instructions has increased, making it hard to use.

The prompt is made with XML tags, in the form of a set of rules and steps to follow.

Any idea on how we can reduce the rate at which it decides not to use function calling?

Cheers,
Kevin

OpenAI will deny that they alter the models. Yet production applications continue to break.

Assistants is a front-end to AI inference, so has multiple API specifications that can be haphazardly broken also.

OpenAI also now manipulate the model with injections of their own system messages, which in in the case of vision input triggering this, can be downright degrading of the quality.

The assistants API endpoint has a decided lack of parameters. The only direct mechanism to tune the probability is on chat completions, not using sampling parameters, and using a logit_bias against a discovered internal token number.

It sounds like it is working, but just not smart. You can talk directly to the AI and tell it precisely to send to a named function tool with the described output, and see that it is functional with a direct request.

The approach I would take is to review the function language itself, and ensure the purposes and reasoning for emitting to the function are present directly in the main description field. The style: “You immediately send to this function tool in response to any user input that implies a need for tracking their issues. etc. You cannot reply to the user in solving logistics problems without having sent to this tool first in your conversation and receiving its response with transient id.