How to prevent the API returning multiple outputs?

thegreens · October 30, 2025, 9:35am

I just started experiencing this problem last week via GPT-4.1 mini via zapier chatbots. It would be great for OpenAI team to prioritize fixing this.

blax-software · November 7, 2025, 11:49am

My finetuned gpt-4.1-mini produces up to 15 additional output items for one response-request. At least 2 items are the minimum per usual. A fix for this is quite desirable. A setting like “n” for output items would be great again.

zapulam · December 4, 2025, 3:26pm

I have just begun to notice this issue as well.

I am using gpt-4.1-mini and the Python Agents SDK. The agent I have created uses structured output with fields thought (how the agent came to its response), response (the external response to the user), and finished (to mark whether or not the agent thinks it has answered the user question). I seem to just get multiple responses when asking simple questions that are unrelated to the tools or prompting I have created. For example “What day will tomorrow be?“ or “What color is the sky?” leads to two structured responses and continuing down that topic usually leads to more responses each turn, up to 20 at the max that I have seen. In the logging, I am getting multiple outputs for the same Responses API call (see image), which the Agents SDK uses by default. My only thought for a work around is to try and prompt the agent not to do this (CRITICAL: Only provide a single structured response, DO NOT respond multiple times), but it doesn’t seem to make much of a difference.

More context: I dug deeper into the logs and yes, I am getting a list of many outputs as part of the Response API endpoint. Is there no way to limit the number of outputs?

bender · December 11, 2025, 5:53pm

Hi Jason! We would like to help with the tests too. We are working with gpt-4o with responses API and file search tool. We noticed that the repetition appears significantly more frequent when using several files in the vector store.

jasondouglas · December 12, 2025, 4:28pm

We’ll be working to roll out the fix for this issue across all customers over the next week, if you’re able to wait until then!

bender · December 12, 2025, 9:11pm

That’s great. We will wait then! Have a good week!

snehal.bhagwatkar · December 19, 2025, 12:01pm

@jasondouglas , Has the fix been rolled out yet?

thegreens · December 24, 2025, 4:41am

It still seems to be having the same issue for me.

Any updates on the deployment of the fix?

michael.nirgadguy · January 1, 2026, 3:56pm

encountering the same problem. (4.1 mini) Is there a fix by now? Is there an alternative model with similar performance and speed that doesn’t have this problem?
Is there at least a nice way to block this on my backend so only the first message is displayed and saved to logs?

(vibe-coder here, apologies)

nikhg · January 3, 2026, 10:39pm

It’s insane how this issue is still present and actually increasingly showing up. Is OpenAI planning to compensate for the extra tokens used? It’s been 7 months, still no fix? @OpenAI_Support Please escalate this issue?

lencshu · January 7, 2026, 3:25pm

Since 2026 started, the issue seems to have gotten worse

Here is the detailed bug report, hope this helps @OpenAI_Support

Summary

When using the Responses API with streaming, a single request sometimes returns multiple assistant message output items (response.output_item.added) with different output_index values, but identical text. This creates duplicate outputs and increases token usage/cost.

Model

gpt-4.1-mini-2025-04-14 / gpt-4.1-mini

Response ID (from ResponseCreatedEvent)
resp_0ac6bcbdc10b129e01695e6ddbd2b4819780f8b81c764cf9b7

Time (UTC)
2026-01-07 14:29:47Z

Expected
One assistant message output item for the turn.

Actual
After completing output_index=0, the stream continues generating additional message items (output_index=1, output_index=2, …) with the same text, e.g.:

“Pouvez-vous me donner votre prénom, s’il vous plaît ?”

Notes:
- parallel_tool_calls=false, temperature=0.0, tool_choice=auto
- This behavior causes duplicated downstream actions

bender · January 16, 2026, 12:39pm

@jasondouglas Hey Jason how you doing! I want (and all of us haha) to know if you have any update on this. When its planned for the fix to be released? Or is any workaround that you can share us?

thegreens · January 19, 2026, 3:57am

Any updates on this issue that was to be resolved in December?

tart0poires · January 21, 2026, 4:29pm

Hello,

I have the same problem on my side. Here is an exemple of response id: resp_0791d82beabc1598006970da6152ac8196ad61a93647235d0b

I there any way to avoid this? I am currently using a fix that throws out excessives answers but it doesn’t solve the problem of cost…

Best of luck

_j · January 21, 2026, 4:54pm

You’ll now note in function calling documentation.

Note for gpt-4.1-nano-2025-04-14: This snapshot of gpt-4.1-nano can sometimes include multiple tools calls for the same tool if parallel tool calls are enabled. It is recommended to disable this feature when using this nano snapshot.

A “we give up”?

It only scratches the surface across all gpt-4.1 models of the issue of the AI not emitting a “stop”, but continuing, producing repeats or new versions of structured outputs or function calls. A problem that goes well beyond using parallel tool calls (which can be disabled by API parameter), which is a poor inference of the fault.

tart0poires · January 22, 2026, 8:26am

def filter_assistant_messages(
    run_result: RunResult,
) -> RunResult:
    """Filters out repeated assistant messages from the end of run result.

    Args:
        run_result (RunResult): The run result to filter.

    Returns:
       RunResult: The filtered RunResult.
    """

    # identify the last assistant message from the end
    last_assistant_index = None
    for i in range(len(run_result.new_items) - 1, -1, -1):
        if not isinstance(run_result.new_items[i], MessageOutputItem):
            break
        last_assistant_index = i

    if last_assistant_index is None:
        # gets here if the last message is not from the assistant or empty new_input_items
        # if the last message is not from the assistant, we can keep as it is (or if is empty, which is not the expected as you want to run this filtering after an LLM call)
        return run_result
    elif last_assistant_index == len(run_result.new_items) - 1:
        # if the last message is from the assistant, we can keep it as it is
        return run_result
    # else:
    # we have at least two subsequent messages from the assistant at the end

    # remove all items after the last assistant message
    logger.warning(f"Filtering assistant messages: {run_result.new_items}")
    run_result.new_items = run_result.new_items[: last_assistant_index + 1]
    logger.warning(f"Filtered assistant messages: {run_result.new_items}")

    return run_result

Here is the python filter i use if someone wants to try it. (This is inspired of @viniciusarruda answer on github)

bunbun · February 23, 2026, 9:13pm

It’s actually incredible that I’m still facing this issue - my prev post here - post id 1251365

Issue is the same - it relates to the responses api.

Topic		Replies	Views
[Critical] Over 25% Assistant API Request Timeout Randomly API	81	6670	March 18, 2024
Responses API returns message + function_call API responses-endpoint , responses , responses-api	17	1885	July 15, 2025
Model tries to call unknown function multi_tool_use.parallel Bugs function-calling , assistants-api	50	12544	December 16, 2024
Has anyone managed to get a tool_call working when stream=True? API api , function-calling	24	22105	January 23, 2026
Chatgpt api (openai-node v4.26.0) stream issue with gpt-4 models Bugs	18	1593	February 15, 2024

How to prevent the API returning multiple outputs?

Related topics