More consistent tool calling for GPT-5

kinghypnos · October 6, 2025, 7:21am

Hi all, I have been using GPT 5 through the API to build some multi-agent thing, but that’s not the important part. A super annoying issue I have been running into are “fake” tool calls in the assistant response. my main agent only has one tool delegate_task and here is an example of how it fails:

Sure! First, I’ll delegate this task to the **web_browser_agent** to read and summarize the article.

```json
{"name":"functions.delegate_task","arguments":{"agent_name":"web_browser_agent","query":"Read and summarize the website https://example.com/article1"}}
```

I’ll delegate this task to the **web_browser_agent** to read and summarize the article.

So rather than actually call the delegate_task tool, it just outputs a json markdown code block or it just outputs a python-style function call with no fence. Is there some way to make function calling more reliable? I know there is an option to force function calling, but I don’t always want that because sometimes the agent should be able to respond in a single text string.

I am also providing some few-shot examples in the system prompt so maybe it’s getting confused? I format the system prompt examples like below:

User: Summarize example.com/article1

Assistant: Okay let me ask the web_browser_agent to summarize it
delegate_task("web_browser_agent", "Please summarize example.com/article1")

Tool response: This article is about ...

Assistant: This article is about ...

I saw some other previous posts saying to “fake” a tool call and make this the repsonse, but then it isn’t a system prompt anymore. Is there a real solution to this yet? Or are there some magic keywords I can say in my prompt?

merefield · October 6, 2025, 8:54am

Why “delegate task”? This is a very generic. Might be too generic.

Also your language “browser”? That might not be clear enough.

Just have a function called web_crawl with the query parameter page_url might be more reliable.

Or even summarise_web_page … experiment?

I have two bots with similar functionality and the function names are much more obvious at the top level.

merefield · October 6, 2025, 10:42am

you can’t change temperature on GPT-5 - it’s a reasoning model - this will have no effect.

https://platform.openai.com/docs/guides/latest-model#gpt-5-parameter-compatibility

merefield · October 6, 2025, 10:49am

hmmm, not so sure, it should bark at you for supplying an incompatible parameter, see:

_j · October 6, 2025, 10:57am

I’m not sure if there is even a formalized “function” being used based on the message seen. Not doubting that the AI models can fail spectacularly at tool use, though.

Show the function you are attempting here on the forum, and the misuse might be apparent to all. You can also say, “No preamble, no final channel, you emit to this tool internally without discussion about your intention directed at the user.”

Functions, and the description of them, should have a pattern of providing a service or action the AI would naturally find useful in conjunction with a user input, and that the function would return a language production in a return message that would inform the AI how to better respond based on that text and use. Most of all, functions are not an output, they are something useful, and useful just some of the time.

Searching the web should be its own tool. It shouldn’t take a messy hierarchy to get there. If for example I was going to have ‘gpt-4o-search-preview’ provide my web searching so I maintain control of the main model, I would write a tool specifically with what I want emitted into part of the prompt, and describe the output and how it should be used.

An AI “orchestrator” should not be consuming user chat inputs. Instead, you likely want a single-purpose AI with one job, with a system prompt and user prompt that has strong containerization against the input to be examined, and then structured outputs that are enforced as the final no-return destination of the AI’s job in judging which specialist AI should be invoked

Then you can think about gpt-5 being a reasoning model - that must only reason with a single purpose of emitting to the right anyOf schema or calling the correct enum…and its internal reasoning adding more confusion at high temperature you can’t change (but can send at value: 1).

mcfinley · October 6, 2025, 12:18pm

@kinghypnos I saw this last week but can no longer repro. But I can describe what I was seeing and how I addressed. I had a series of tools and my response instructions asked the model to create XML back to the user.

Instead of seeing tool calls, I was getting tool calls formatted into the user output, similar to what others have reported Tool Calls placed inside content and an even stranger one Model tries to call unknown function multi_tool_use.parallel

what “solved” it for me was the following. I will probably try to remove these fixes on my next refactor because they feel like work-arounds:

gpt-5 accepts a “temperature” parameter but it must be set to 1 or the API call throws an error. What I found is that REMOVING this parameter had different behavior than merely setting it to 1. Some of the other forum members challenged this so I tested it and it no longer appears to be the case (thank you @merefield )
I look for “.” in the tool name and bounce the response back to the model by adding a conversation turn asking it to correct the error. Yuck but it got the application back up and running, and I’ll follow-up with a retest to remove the hack if it works without.
I look for “parallel” or “functions.” in the user output and bounce these back to the model also. Another yuck.

Hope it helps. I don’t like to propagate campfire knowledge but these patches got me running…

Mike

kinghypnos · October 7, 2025, 5:08am

Thanks for your response @merefield! Basically I am trying to create a two-tiered agent system. The top level is a “orchestrator” agent that is able to call these different subagents. An example task I had for my web browsing subagent is “Tell me the top 7 highest mountain peaks in the world” and the subagent would use it’s specialized google search/scraping/etc. tools. I like this method because the main agent doesn’t have to “pollute” it’s context to get this information. I also like this delegate_task method because it will allow the agent to create it’s own subagents in the future (ex: graphing agent, email agent, etc.) without blowing up the number of tools.

I do also provide descriptions for the 2 agents I have at the end of the prompt. It’s something like this:

# Agents
- **file_system_agent**: To interact with local files.
- **web_browser_agent**: To browse the web.

I could rework it to call web_browser_agent(...) to make the function name correlate more directly with the function, but I am a little concerned that it will increase the # of functions too drasticly. Maybe this scaling is a problem for later

kinghypnos · October 7, 2025, 5:13am

Oh thank you @_j this structured outputs solution is really clever, let me try this and get back to you. In my response to merefield I’ve described a little more but I am basically using the strands sdk agent as a tool pattern (I can’t link it but it’s easy to search up). They have a couple examples on their page which I am basically copying now.

Topic		Replies	Views
GPT-4o innate agentic tool use capabilities API assistants-api , gpt-4o	5	2834	December 4, 2024
Force communication through tool API tool , function-calling , agents , assistants-api	4	363	November 22, 2024
How can I use function calling with response format (structured output feature) for final response? Feedback gpt-4 , assistants-api	13	8418	October 27, 2025
What is the exact system prompt that gets inserted when calling a tool? API gpt-4 , chatgpt , api , assistants-api	6	3261	January 1, 2025
Gpt-3.5-turbo-1106 model consistently responds with unnecessary and inappropriate function calls [confirmed BUG JAN 26] Bugs api , tools	9	2518	April 4, 2024

More consistent tool calling for GPT-5

Related topics