We’ve recently started using GPT-4o tools/functions. Since GPT-3.5, we’ve used mostly JSON outputs, which work well for most of our tool-based agent flows. However, with our new hybrid text chat/voice AI assistant, for our agent workflow platform, it made the most sense to use “tools” as they’re supported by both text and voice APIs, providing standardized interactions across modes.
Our assistant needs to support dozens of function calls, so we created different “modes” such as agent_selection_and_subscription, agent_configuration, agent_operations, user_login, etc. Each assistant “mode” has a system prompt and a corresponding array of functions to use in that mode.
Initially, I planned to create a routing prompt and set up mode switching manually. But then o1-preview suggested I equip each assistant “mode” with the same “switch_mode” tool/function, which makes the model aware of other modes as well.
I was skeptical at first—is it really that simple? Could I really skip most of the control flow logic? And if I don’t implement it, will I lose too much control?
So far, I’ve mostly been testing the voice aspect of the hybrid assistant, as it’s much faster, and I’m in awe. The assistant can take a query like, “Start this agent,” and if in agent_selection mode, it first runs the select_agent tool, then runs the switch_mode tool (which replaces the agent system prompt and functions, except for switch_mode itself), then the “start_agent” tool, and it all just works!
I’m mostly sharing this to highlight my positive experience, but also to ask - does it feel too good to be true? Has anyone experienced any major issues with this type of setup?
Of course, we’re not really deleting anything in the database, and we’ve made it easy to undo any errors the model makes. So far, in limited testing, no major errors have occurred.
p.s. Here is the work-in-progress “mode” tools code, a part of our open source Your Priorities agentic engagement platform :