My idea is to develop a proxy assistant that serves as a proxy and decision-maker for specialized sub-assistants, each dedicated to particular tasks and accessible via function calls. However, when the proxy assistant recognizes the need to engage one of these sub-assistants, it encounters an issue: the run gets locked in a “requires_action” state, preventing the initiation of another run with the sub-assistant on the same thread.
I’m aware that “When a Run is in_progress and not in a terminal state, the Thread is locked.” so this is the expected behavior, but…
Has anyone else built a similar solution in a different way or found a workaround for this issue?
How are you implementing the sub-assistants? I am thinking, perhaps you can create an endpoint for each sub-assistant and just call them like an external API.
As in Assistants API an assistant and a run are separate notions, you can do runs with any number of different assistants within one thread. The only problem is these assistants are not aware of each other and consider all previous messages from assistants as their own.
I will break down the reasoning into two concepts to facilitate understanding. My initial idea was to have several assistants operating within the same thread and a proxy assistant that would act as an orchestrator. Each sub-assistant is a function call, so when the proxy assistant identifies the need for a sub-assistant, it enters the ‘requires_action’ state, and apparently, this state is considered as a state in progress. Therefore, I can’t run the sub-assistants in the same thread because it gets locked.
The second approach I followed consists of the same set of proxy assistant and sub-assistants, but when the proxy assistant identifies the need for a sub-assistant, an individual thread is created in the sub-assistant and the necessary information is passed through a function call. This allowed me a back-and-forth between the two threads like a conversation. I ended up doing something like this: Optimizing AI: Building Advanced Multi-Agent Frameworks with Azure OpenAI Assistant API
This works, however, this back-and-forth between the threads takes a lot of time and often OpenAI ends up canceling the run or marking it as expired.
So when you try and hit an API in a GPT it asks for permission from the user before it continues. Maybe it’s hanging up, waiting for user authorization? If so, perhaps there is a way you can programmatically respond and move on?
Yeah, sometimes it does. Since my front end is a chat, users can reply to keep the chat going. This is good because it starts a new run. I’ve noticed that sometimes OpenAI cancels a run if it’s taking too long to ‘complete’, even if it’s still ‘in progress’.
I’ll try it on my use-case. As of now, I’m using the following architecture:
Several function specific assistants
1 thread
I start with a welcoming assistant which has 1 tool. This tool besides performing the needed action, moves the thread to the next assistant and also updates its instruction based on the input from the conversation with the first one.
For example, a chatbot site editor:
welcoming assistants requests the page the user wants to manage
once received, tool is used that adds page_id to the metadata and changes current assistant id to the next one
next assistant can edit text of the page (tool 1), layout (tool 2) and change page settings (tool 3)
there can be next assistant to which we move after any tool use
Forgive me if I totally misunderstand your problem but why not copy the messages in the thread and create a new thread with those messages to give the sub-assistants to run on?
In that sample, user communicates with the TaskPlannerAgent (which is using chat completion api) in their own thread. The TaskPlannerAgent creates and schedules tasks for the necessary assistants to fulfill the user’s request, based on the request itself and knowledge about the available assistants (See YAML configurations for CodeProgrammerAgent, CodeInspectionAgent in config folder).
Think of my example, as a kind of “hello world” for assistants. I purposely kept it simple. In this case I saw “writer” and “critic” as peers so the notion of sub assistants and multiple threads really didn’t apply, but for more complex real world examples you are right. In a world without AI, can the problem be solved by getting a small group of ‘experts’ into a room or does the solution require a larger team where small groups going into breakout sessions who then report back for further discussion until a solution is reached? Assistant/Agents require us to think about what kind of expertise we require, and the structure of the team doing the problem solving.
For the first production version, I ended up with two Assistants like this:
Task Planner Agent: responsible for creating the task execution plan and executing it.
Critic Agent: responsible for reviewing the generated content and providing feedback.
Task Planner Agent calls Critic Agent through function call.
I kept the strategy of a main assistant with a main thread and secondary threads for the sub-assistant. This approach worked best for me. Most of the time, the final output has better quality than just one agent working on the thread
Couldn’t handle the Run being canceled properly in order to execute the sub-assistant on the main thread.
This would be a huge increase in token usage. And Assistants API doesn’t work like Chat Completion API, I can’t add a message with the assistant role in the thread.
Btw, welcome to the community @WokeBloke!!! Thanks for joining us.
Totally agree.
However, passing the chat history to a secondary thread in a coherent way through function calling is a challenge. Sometimes it works, sometimes it sends half of the information to the sub-assistant.
So I wonder if with a larger set of Assistants we can manage this information transfer via function calls without losing relevant information along the way.
I hope this multi agents workflow is addressed in the next major update from OpenAI.
I’ll rollout this version and gather user feedback.
Note: I ended up implementing message streaming, so this has decreased the response time a bit.
Andres, I think you can use the event system to solve this. Just queue the operation that needs access to the thread and run it when you get a thread.run.completed message.
Hey @andres_santos! I’ve set out to create the same orchestration model you have with one proxy assistant and thread, and several sub assistants and their threads.
Preface - I’m not an engineer or developer, I’m just super keen on trying this type of orchestration on a side-project I’m working on. I use the Azure example of Multimodal Multi Agents but I can’t quite get there even though I detached Azure dependancies and rewrote the API calls to spec.
Would you be willing to share this project of yours to the community?
I was l doing something similar to you and here is what I’ve done to overcome this issue.
You have 1 ProxyAssistant (or whatever you wanna call it) who is responsible for delegating the tasks to other agents/assistants.
When the user q comes in - he calls the method ‘choose_agent’ and with that goes into a state ‘requires_action’
When I call the appropriate agent he selected (lets say he selected SubAssistant), I copy over the messages from ProxyAssistant Thread onto a new thread and this new agent calls a bunch of methods and let’s say that it generates a response. This response is now in the SubAssistant thread.
So I copy over the response from the SubAssistant but I can’t attach it to the thread of the ProxyAssistant because it is locked.
I did the following:
Submit tools - I called the submit_tools endpoint to try and finish the original ProxyAssistant thread
I then take the run_id that ProxyAssistant used to call the agent and I call the ‘cancel’ run endpoint on that run.
I can now add the response from the SubAssistant to the ProxyAssistant thread and initiate a new run.
NOTE: When cancelling a thread it can go into a state ‘cancelling’ before it is ‘cancelled’. And also, there could be some race condition because I encountered this today:
148 if self.decider_run and self.decider_run.status == "requires_action":
--> 149 self.client.beta.threads.runs.cancel(thread_id=self.decider_thread.id, run_id=self.decider_run.id)
150 self.logger.info(f"Cancelled a run with ID: {self.decider_run.id}")
BadRequestError: Error code: 400 - {'error': {'message': "Cannot cancel run with status 'cancelled'.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
I ended up doing some hacky while checks to ensure it was cancelled before I proceed
I agree this is not a perfect or even production-quality solution, but does the trick for now. Would love to hear how have you resolved this