I’ve been encountering an issue where the API returns more than one assistant message in the output.
I first noticed this behavior using the Agents SDK, but after further investigation, I found that it behaves the same way when using the OpenAI SDK and even pure cURL requests. I’m using tools and structured outputs, and the responses consistently contain multiple assistant messages — sometimes even repeated ones.
It seems similar to what others have reported here:
openai/openai-agents-python#1814
Assistant API Repeat the same message
Prevent multiple Assistant messages in a run
I’d like to clarify whether this is an actual bug, or if it’s expected behavior that could be mitigated by prompt tuning.
Since I don’t know the exact inner workings of the API, my current understanding is that there could be two possible mechanisms:
-
Option 1:
The LLM generates tokens until it reaches an EOS (end-of-sequence) token (or token limit, which is not the case here).
Afterward, the API parses the full output and attempts to cast it into the defined output schema. -
Option 2:
The LLM generates tokens until the output matches the defined schema, or until EOS (or token limit).
If this isn’t a bug, I assume Option 1 is how the system works. However, Option 2 feels like a more intuitive design — the generation should stop once the schema is satisfied.
That leads me to my main question:
If the model is indeed matching the schema during generation, why doesn’t the API stop after the first assistant message that satisfies the schema? Instead, I’m seeing multiple assistant messages that each match it.