Here is an example use case for a music-related AI system with a GUI:
User prompts: “i want to create a new song about a dream i had last night about astronaut cats in outer space”
I’d like the agent(s) to then generate the following:
- song title
- song lyrics
- image based on the song
- audio clip
The key part here is the GUI I have in mind is very strict, let’s say it’s a grid of 4 boxes, 2 on the top and 2 on the bottom. And where each section goes is fixed, so title always goes in the top left box, lyrics in the top right, etc. So I don’t necessarily want one single message output as it would be difficult to divide them how I’d like. I know I can use output_type
, but the final message output won’t return until that schema is fully formed, so the user will have to sit and wait for all 4 parts to be done right?
Most important is that I’d like each of the 4 components to start streaming their results to the user token by token as they are ready.
I’m thinking about the decentralized pattern instead of a manager pattern, where a “triage” agent handoffs to specialized agents. But handoff means the specialized agent will take over execution, so each handoff would happen synchronously? If the triage agent has 4 handoffs, can all handoffs occur in parallel? And then each specialized agent can stream tool output directly back to the user?