Multi Turn and Stakholders Conversations Using Agents SDK

Hello there, I’m bulding a chat space where multiple users can talk with multiple agents in the same chat.

I’m trying to use Agents SDK with the multi provider support using OpenAIChatCompletionsModel, but I’m having trouble with the input format.
My understanding is that Runner.run expects either a str, that will be passed downstream as {role: ‘user’, content: ‘input string’} or a list[TResponseInputItem] with the respective role / content atributes for each item.

My problem is that this schema is not allowing me to achieve this multi stakeholder dynamic. Since we only have user / assistant granularity the models are not able to infer the correct context for each participant, usually mixing up things.

There is any architectural solution in place for this kind of problems?

Two ways I’ve been able to do it:

  1. use the system prompt to tell the model that the User is representing a room full of people. Then on each user turn include one or more turns from different agents. So the actual message list the model sees is “user” but inside the user message is a list of “speaker1”:”what speaker1 said” followed by “speaker 2”:”what speaker 2 said” etc. This works because ultimately it is all serialized into context for inference and your system prompt is explaining what’s happening.

  2. use the system prompt to tell the assistant that its job is to call on each of N speakers in turn so they can each respond. Then give it a tool that allows it to ask a speaker for its response. The model should call for speaker 1 then 2 then 3 and so forth. You can do whatever you want with the user turn.

I’ve seen the first pattern work well in longer convertsations with few “agents”. I’ve seen the second pattern work well when there are 6+ agents, but I have not done it in long conversations.

One or both of these may seem like hacks, and in fact they are really just adaptations to the underlying limitation of user/assistant as you indicate. But realize this is not a simple limitation - its how the model has been fine-tuned to handle chat. So without a model trained on the broader multi-agent dynamic, you’re stuck.

I would run a simple completion type test where you just create a multi-agent transcript that is formatted like a play for human actors. prompt the model with that and see if it is able to understand the perspective of various characters. If it is, it will probably do what you want.

1 Like