I have an assistant configured with function calls (tools).
The system instruction does not explicitly mention function calling.
Observed behavior:
When using gpt-4.0, the model correctly detects and invokes the configured function calls and fetches the data as expected.
When switching to gpt-4.1 or gpt-4.1-2025-04-24, the model does not invoke any functions, even though the same tools are available and the prompt remains unchanged.
Question / clarification needed:
Is explicit instruction required in the system prompt for GPT-4.1 models to enable or trigger function calling?
Did the tool-calling behavior change between GPT-4.0 and GPT-4.1?
Are there additional parameters or flags required for GPT-4.1 to allow automatic function invocation?
Expectation:
The model should consistently detect and call available functions when relevant, regardless of minor model version changes, or this requirement should be clearly documented.
You currently have an application that, by virtue of the function’s apparent usefulness and applicability to a task, GPT-4 will work, but GPT-4.1
When using a model that is 1/15th the cost for input and 2/15th the cost for output, correlating to the underlying computation, and for which multiple developers have reported even further degradation since release, you do not observe function-calling on the Assistants API endpoint in the same pattern.
First round of assurance for you to perform
Functions being placed into AI context will consume tokens that you are billed for, roughly equivalent to the natural language version of the schema and text.
See that you face a larger bill when including functions on the new model
See that the AI can answer, “what functions and function methods can you call?”
Are the functions even present, or is there a bug?
Tuning up functions
While gpt-4 of 2023 is intuitive in language completion and in-context training, gpt-4.1 is powered by instruction-following and more “chat”.
You must focus effort on improving the function if it is there and not being called when warranted.
provide as tool description: what the function does and what it will return, and when the AI shall use it, in no uncertain terms. You can make a large multi-line description so that based on user input, the AI can decide the applicability and then also fill properties correctly.
test: make a user query where the usefulness and necessity of each function can not be denied - will the AI call tools then? Will it infer?
Answers
It better to focus all effort on the function schema, so that the specification can be independently placed in any Assistant and be found useful.
The tool call mechanism did not change, but the API surface has. If you are using “tools” → functions, instead of direct calls to the deprecated “functions” method (as on Chat Completions), you have already adapted.
GPT-4-0613 is the initial release of functions on the API. It cannot make parallel tool calls, and no additional tool to emit those functions in parallel is placed into AI context. That’s the only significant difference. On a better endpoint, you would disable parallel tool calling by API parameter to have the same input to a newer AI model such as 2025’s gpt-4.1.
Wrap-up
It is unavoidable that different models separated by multiple years in release date and different training, reinforcement learning, and ultimately, size, would perform differently in almost every aspect. You will need to tune input language to match their behaviors.