Help with choosing pattern for a particular agent system use case

Here is an example use case for a music-related AI system with a GUI:

User prompts: “i want to create a new song about a dream i had last night about astronaut cats in outer space”

I’d like the agent(s) to then generate the following:

  • song title
  • song lyrics
  • image based on the song
  • audio clip

The key part here is the GUI I have in mind is very strict, let’s say it’s a grid of 4 boxes, 2 on the top and 2 on the bottom. And where each section goes is fixed, so title always goes in the top left box, lyrics in the top right, etc. So I don’t necessarily want one single message output as it would be difficult to divide them how I’d like. I know I can use output_type, but the final message output won’t return until that schema is fully formed, so the user will have to sit and wait for all 4 parts to be done right?

Most important is that I’d like each of the 4 components to start streaming their results to the user token by token as they are ready.

I’m thinking about the decentralized pattern instead of a manager pattern, where a “triage” agent handoffs to specialized agents. But handoff means the specialized agent will take over execution, so each handoff would happen synchronously? If the triage agent has 4 handoffs, can all handoffs occur in parallel? And then each specialized agent can stream tool output directly back to the user?

Multiple handoffs cannot occur at once, an error is thrown. So that pattern is not suitable for parallel agent/tool runs.

And if I use the agent as tools (manager) pattern like:
tools=[song_title_agent.as_tool(), lyrics_agent.as_tool(), artwork_agent.as_tool(), audio_agent.as_tool()], the main manager agent will not respond with message_output until it receives output from all it’s tools. So streaming token by token does not make sense in this scenario. And you can’t stream tool outputs token by token correct?