Feature Request: First-class Object References for Function Calls

Hello. I would like to request a feature: the capability and API for large language models to be able to manage described first-class objects, canonical object references, while participating in unfolding dialog, and to be able to use these objects in function calls.

By “managing object-references”, I mean obtaining and managing references to first-class objects in the form of things or values returned from previous function invocations and utilizing them as input arguments to subsequent function invocations, including after interspersed rounds of dialog or other joint tasks occurred.

As envisioned, in addition to prompts containing described functions with described parameters, prompts to large language models could come to include described first-class objects, or things.

Natural-language descriptions of those contextually-relevant objects would, ideally, allow LLMs to process natural language inputs to determine which things or objects, specifically, to provide as arguments to invoked functions in response to users’ natural-language input.

Described objects could be canonically referenced using either URIs or GUIDs. These canonical identifiers could be recognized by interoperating software applications when function calls were relayed to them.

For a concrete example, a user could engage in dialog to create four objects: a red cube, a green cube, a blue cube, and a yellow cube. Through dialog, the user could instruct an LLM to stack and unstack the cubes, referring to them in various ways.

A create_cube(color) function would return a first-class object reference that could be utilized in subsequent dialog moves, subsequent function calls. The functions to move the cubes around would accept canonical object references as their arguments.

LLMs are already capable of these kinds of tasks and reasoning, as evidenced by their source-code generation functionalities. This feature request pertains to the chat completion API.

Is anyone else interested in these scenarios? Are there other approaches for developing the indicated example (which hopefully scale towards a “copilot for CAD/CAE” domain)? Any thoughts on these ideas about describing first-class objects alongside described functions with described parameters in LLM prompts? Thank you.

P.S.: See also Large Language Models.

I’m working on cad integration. Function calling covers most of it for me and combining regular chats with task specific assistants works nicely.

I’m storing small files that it uses to keep track of states and tasks combined with rag to help it maintain a good world model ‘in it’s head’ and work through bigger tasks. Reasoning in 3d space is difficult for gpt4 but not impossible.

Getting gpt4 to define specifications and approaches works well as it knows what it likes. I’m starting to use blender as it’s ‘code interpreter’ for 3d, I had it write an addon for blender that let’s it run scripts and get results. It can use this for performing 3d operations and helping it reason about things, like using a calculator, if you see what I mean. I’ve got a lot of systems integration and automation background, so this is like a dream come true.

Very interested in this, be curious to hear how it goes for you.

(Among the many, many, applications in a small studio like mine is a voice controlled cnc machine - ‘Hey, grab the bullnose and put it a hair above the center of the workpiece.’ - ‘Can you rough out that thing we’ve been working on? I’ve got a chunk of foam on the bed ready to go.’)

1 Like

@moonlockwood, hello. I am also interested in exploring natural-language, structured, and hybrid-structured-document design specifications for AI systems and LLMs.

I have also done some research into world models, mental imagery, imagination, and visuospatial reasoning, including as these topics pertain to AI.

I am interested in whether and how AI systems like GPT4V might make use of computer vision as or after they perform design and engineering tasks. In these regards, AI systems could be provided images, renders, image sequences, video clips, and/or 3D object models, including as a result of requesting these (perhaps through one or more function calls).

Designers and engineers will, I think, desire to be able to seamlessly move back and forth between utilizing previous user-interface components and techniques and interacting with AI assistants. While doing this, they might expect AI assistants to be able to answer questions and provide assistance as if aware of what the users were doing in the CAD/CAE applications.

In these regards, I am thinking about the role that narrative could play.

Interoperating software applications could describe and detail to AI systems, into the multimodal dialog transcripts, what users were doing when utilizing legacy user-interface components and techniques.

Also possible are multiagent system approaches. One AI system, LLM-based agent, could receive user-activity narratives and use them to answer those questions which another LLM-based agent might have while co-creating with the user.

Solutions and approaches should scale to group design activities. Groups of designers and engineers could participate in team chat with AI systems or agents.

I am also exploring some theoretical topics pertaining to the new dialog threads feature. Will designers and engineers want to make use of single dialog threads for their unfolding co-creative processes or will they prefer to have different dialog threads for individual parts and subparts, individual tasks and subtasks?

In any event, with respect to implementation details, I am thinking about event listeners, streams of natural-language-described events, stream processing, and, in theory, utilizing secondary LLMs to process resultant natural-language-enhanced event stream data into natural-language narratives.

I wonder whether you have seen Copilot or Jupyter AI? It is interesting to think about how the contents on the right-hand side of the screen could be more than simple chat dialogs for CAD/CAE scenarios.

Likewise, please do feel free to update me and the community, here, as your projects progress!

Exactly. I have a custom chatgpt type client that has functions for gpt4 to access files, blender, a python shell etc. and more sophisticated history management. Things keep getting better and better on the api and llm side, this should all be viable pretty soon.

I’m trying to get access to fine-tuning got for exactly this purpose - improving spatial reasoning. It’s there but you have to dig and dig and dig. With some work on getting the fine tuning right (a lot of work) I’m pretty convinced that the things you are talking about are possible in their basic form right now with an extremely robust framework and well-tuned gpt4.