Hi all, I have been using GPT 5 through the API to build some multi-agent thing, but that’s not the important part. A super annoying issue I have been running into are “fake” tool calls in the assistant response. my main agent only has one tool delegate_task and here is an example of how it fails:
Sure! First, I’ll delegate this task to the **web_browser_agent** to read and summarize the article.
```json
{"name":"functions.delegate_task","arguments":{"agent_name":"web_browser_agent","query":"Read and summarize the website https://example.com/article1"}}
```
I’ll delegate this task to the **web_browser_agent** to read and summarize the article.
So rather than actually call the delegate_task tool, it just outputs a json markdown code block or it just outputs a python-style function call with no fence. Is there some way to make function calling more reliable? I know there is an option to force function calling, but I don’t always want that because sometimes the agent should be able to respond in a single text string.
I am also providing some few-shot examples in the system prompt so maybe it’s getting confused? I format the system prompt examples like below:
User: Summarize example.com/article1
Assistant: Okay let me ask the web_browser_agent to summarize it
delegate_task("web_browser_agent", "Please summarize example.com/article1")
Tool response: This article is about ...
Assistant: This article is about ...
I saw some other previous posts saying to “fake” a tool call and make this the repsonse, but then it isn’t a system prompt anymore. Is there a real solution to this yet? Or are there some magic keywords I can say in my prompt?
I’m not sure if there is even a formalized “function” being used based on the message seen. Not doubting that the AI models can fail spectacularly at tool use, though.
Show the function you are attempting here on the forum, and the misuse might be apparent to all. You can also say, “No preamble, no final channel, you emit to this tool internally without discussion about your intention directed at the user.”
Functions, and the description of them, should have a pattern of providing a service or action the AI would naturally find useful in conjunction with a user input, and that the function would return a language production in a return message that would inform the AI how to better respond based on that text and use. Most of all, functions are not an output, they are something useful, and useful just some of the time.
Searching the web should be its own tool. It shouldn’t take a messy hierarchy to get there. If for example I was going to have ‘gpt-4o-search-preview’ provide my web searching so I maintain control of the main model, I would write a tool specifically with what I want emitted into part of the prompt, and describe the output and how it should be used.
An AI “orchestrator” should not be consuming user chat inputs. Instead, you likely want a single-purpose AI with one job, with a system prompt and user prompt that has strong containerization against the input to be examined, and then structured outputs that are enforced as the final no-return destination of the AI’s job in judging which specialist AI should be invoked
Then you can think about gpt-5 being a reasoning model - that must only reason with a single purpose of emitting to the right anyOf schema or calling the correct enum…and its internal reasoning adding more confusion at high temperature you can’t change (but can send at value: 1).
@kinghypnos I saw this last week but can no longer repro. But I can describe what I was seeing and how I addressed. I had a series of tools and my response instructions asked the model to create XML back to the user.
what “solved” it for me was the following. I will probably try to remove these fixes on my next refactor because they feel like work-arounds:
gpt-5 accepts a “temperature” parameter but it must be set to 1 or the API call throws an error. What I found is that REMOVING this parameter had different behavior than merely setting it to 1. Some of the other forum members challenged this so I tested it and it no longer appears to be the case (thank you @merefield )
I look for “.” in the tool name and bounce the response back to the model by adding a conversation turn asking it to correct the error. Yuck but it got the application back up and running, and I’ll follow-up with a retest to remove the hack if it works without.
I look for “parallel” or “functions.” in the user output and bounce these back to the model also. Another yuck.
Hope it helps. I don’t like to propagate campfire knowledge but these patches got me running…
Thanks for your response @merefield! Basically I am trying to create a two-tiered agent system. The top level is a “orchestrator” agent that is able to call these different subagents. An example task I had for my web browsing subagent is “Tell me the top 7 highest mountain peaks in the world” and the subagent would use it’s specialized google search/scraping/etc. tools. I like this method because the main agent doesn’t have to “pollute” it’s context to get this information. I also like this delegate_task method because it will allow the agent to create it’s own subagents in the future (ex: graphing agent, email agent, etc.) without blowing up the number of tools.
I do also provide descriptions for the 2 agents I have at the end of the prompt. It’s something like this:
# Agents
- **file_system_agent**: To interact with local files.
- **web_browser_agent**: To browse the web.
I could rework it to call web_browser_agent(...) to make the function name correlate more directly with the function, but I am a little concerned that it will increase the # of functions too drasticly. Maybe this scaling is a problem for later
Oh thank you @_j this structured outputs solution is really clever, let me try this and get back to you. In my response to merefield I’ve described a little more but I am basically using the strands sdk agent as a tool pattern (I can’t link it but it’s easy to search up). They have a couple examples on their page which I am basically copying now.