Complex Function Calling Scenarios

OpenAI Community,

Hello. As described here and here, the basic steps for function calling are:

  1. Call the model with the user query and a set of functions defined in the tools parameter.
  2. The model can choose to call one or more functions; if so, the content will be a stringified JSON object adhering to your custom schema (note: the model may hallucinate parameters).
  3. Parse the string into JSON in your code, and call your function with the provided arguments if they exist.
  4. Call the model again by appending the function response as a new message, and let the model summarize the results back to the user.

I was recently brainstorming about step 3 and about more complex functions which could be invoked with a dialogue context or dialogue thread context object as a special argument. Some functions might want to chat with AI systems to produce results… perhaps in new (forked, cloned, or copied) dialog threads.

I was also brainstorming about step 4, and about how some functions might want to provide AI systems with, beyond or instead of output results, messages, information, warnings, errors, explanations, or usage instructions.

That is, beyond or instead of providing output results, some functions might want to add content to the dialog thread context. For example: “you must first select a 3D object before calling the move_selected_object function.”

This type of content need not be displayed to end-users but could be useful for AI systems. AI systems could, for the running example, respond by next selecting a 3D object and then reinvoking the function. Perhaps AI systems could learn about software applications from analyzing their multimodal interactions with them.

Also, considering the running example, perhaps preconditions and effects could be parts of functions’ natural-language descriptions.

Thank you. Is anybody else interested in these topics?

I’ve written something like this; I first have a router with a list of functions and two arguments; user_task fully encapsulated description for the task and then function_name; from there I detach the chat history and only use user_task as context for follow-up calls.
And the way I have written my code lets me easily create a longer range of follow up calls like in the case the bot is creating an event, if it’s set attendance form to enabled then I create a follow up for creating a form (which itself is a 2step process of asking for a structure/layout of a set of components, and then a dynamic function is built based on that structure as objects with various properties for each form field et.c).

But generally speaking all your posts are very cryptic and it’s hard to understand exactly what’s going on.

It could be that some of the concepts in the ideas, above, sort of blur the lines between traditional functions and agents, bots, or assistants in multiagent systems (see also: Autogen).

I’m specifically brainstorming about 3D CAD/CAE scenarios and “agent-like functions”. Existing related projects include: GPT × Blender.

Your example with an invoked function creating and presenting follow-up questions for a user or AI system, pertaining to the structure and layout of 2D components or form elements, does resemble the ideas.

For example, a user, at a comfortable level of task abstraction, might want to create a 3D gear and, perhaps they are initially vague while expressing that design objective. Through dialog, AI systems could obtain needed or useful information to complete the task, designing a 3D gear for a user.

I’m thinking about how best to use new OpenAI features for these scenarios. I’m excited to consider that AI systems could request screenshots (or files, maybe 3D object models) as or after they perform tasks for users.

Forking, copying, or cloning dialog threads could be an API feature request, in particular if there is a broader interest.

Additionally or alternatively, multiple AI agents could enter sidebars, nested scopes in dialog contexts, this content could be initially collapsed for and subsequently expandible by users in terms of its visibility. These or similar UI/UX concepts would allow invoked multiagent dialog-based systems to be both non-obtrusive to and inspectable in dialog contexts.

Per these UI/UX concepts about collapsible and expandible trees of messages, maybe, in the future, threads might be tree-like instead of list-like collections of messages.

Alternatively, special-purpose messages resembling pragma directives could be utilized. In one approach, a checkpoint pragma directive could be of use for helping users to navigate through lengthier dialogs. In another, arguably better, approach, tree-like scopes and sub-scopes could be delivered via region and endregion pragma directives. This latter approach would resemble those features of some programming languages which utilize similar pragma directives to allow nestable portions of source code to be collapsed and expanded in IDEs.

Regardless of implementational details, with tree-like features for navigating dialogs, AI multiagent conversations occurring in nested subtask scopes on behalf of users could be (automatically or manually) collapsed and subsequently expanded by users via UI/UX techniques.

These features would help to alleviate runaway dialog contexts as multiple AI systems interacted with one another to complete tasks and subtasks on behalf of users. That is, a user wouldn’t have to scroll way back up to find the context of what they were doing before creating a 3D gear.