How to prevent the API returning multiple outputs?

I am using the responses API with file search.

However, sometimes it creates 2, and sometimes even 15+ output messages.

Even in the Logs page on the dashboard I see that the assistant sent 15 similar messages in a row and it eats tokens like crazy.

I have tried setting parallel_tool_calls to false, yet it still creates multiple output messages (with the same message ID).

Is there a way to force it to create only one output?

There doesn’t seem to be any “output count limit” in the API documentation.

1 Like

I am seeing that too in the Responses API non-streaming. I get multiple identical output items with very similar content. I think that there are several bugs in the Responses API. The Agents SDK works around these bugs by just taking the first output item that fits the desired return type.

This is a model training fault. It is not emitting a stop sequence of the message format. Are you using “gpt-4.1-mini” perhaps?

You can add your own, using the stop parameter - oh, wait, no you can’t, because Responses doesn’t offer any token level control, and is thus unsuitable for a great many things, such as building production applications against OpenAI models. And it still wouldn’t work when inside a tool call, because you’d get parsing validation if stopping at and removing something like "}\n\n if using any API SDK methods or a strict function schema.

Yes, gpt-4.1-mini

A solution is to set max output tokens, if you know the expected output lenght.

It will write the first reponse, and the second response will be cut off due to reaching the limit. Then we just use the first response.

1 Like

Thanks for taking the time to flag this, I’ve raised the issue with OpenAI.

2 Likes