I just started experiencing this problem last week via GPT-4.1 mini via zapier chatbots. It would be great for OpenAI team to prioritize fixing this.
My finetuned gpt-4.1-mini produces up to 15 additional output items for one response-request. At least 2 items are the minimum per usual. A fix for this is quite desirable. A setting like ânâ for output items would be great again.
I have just begun to notice this issue as well.
I am using gpt-4.1-mini and the Python Agents SDK. The agent I have created uses structured output with fields thought (how the agent came to its response), response (the external response to the user), and finished (to mark whether or not the agent thinks it has answered the user question). I seem to just get multiple responses when asking simple questions that are unrelated to the tools or prompting I have created. For example âWhat day will tomorrow be?â or âWhat color is the sky?â leads to two structured responses and continuing down that topic usually leads to more responses each turn, up to 20 at the max that I have seen. In the logging, I am getting multiple outputs for the same Responses API call (see image), which the Agents SDK uses by default. My only thought for a work around is to try and prompt the agent not to do this (CRITICAL: Only provide a single structured response, DO NOT respond multiple times), but it doesnât seem to make much of a difference.
More context: I dug deeper into the logs and yes, I am getting a list of many outputs as part of the Response API endpoint. Is there no way to limit the number of outputs?
Hi Jason! We would like to help with the tests too. We are working with gpt-4o with responses API and file search tool. We noticed that the repetition appears significantly more frequent when using several files in the vector store.
Weâll be working to roll out the fix for this issue across all customers over the next week, if youâre able to wait until then!
Thatâs great. We will wait then! Have a good week!
@jasondouglas , Has the fix been rolled out yet?
It still seems to be having the same issue for me.
Any updates on the deployment of the fix?
encountering the same problem. (4.1 mini) Is there a fix by now? Is there an alternative model with similar performance and speed that doesnât have this problem?
Is there at least a nice way to block this on my backend so only the first message is displayed and saved to logs?
(vibe-coder here, apologies)
Itâs insane how this issue is still present and actually increasingly showing up. Is OpenAI planning to compensate for the extra tokens used? Itâs been 7 months, still no fix? @OpenAI_Support Please escalate this issue?
Since 2026 started, the issue seems to have gotten worse
Here is the detailed bug report, hope this helps @OpenAI_Support
Summary
When using the Responses API with streaming, a single request sometimes returns multiple assistant message output items (response.output_item.added) with different output_index values, but identical text. This creates duplicate outputs and increases token usage/cost.
Model
gpt-4.1-mini-2025-04-14 / gpt-4.1-mini
Response ID (from ResponseCreatedEvent)
resp_0ac6bcbdc10b129e01695e6ddbd2b4819780f8b81c764cf9b7
Time (UTC)
2026-01-07 14:29:47Z
Expected
One assistant message output item for the turn.
Actual
After completing output_index=0, the stream continues generating additional message items (output_index=1, output_index=2, âŠ) with the same text, e.g.:
âPouvez-vous me donner votre prĂ©nom, sâil vous plaĂźt ?â
Notes:
- parallel_tool_calls=false, temperature=0.0, tool_choice=auto
- This behavior causes duplicated downstream actions
@jasondouglas Hey Jason how you doing! I want (and all of us haha) to know if you have any update on this. When its planned for the fix to be released? Or is any workaround that you can share us?
Any updates on this issue that was to be resolved in December?
Hello,
I have the same problem on my side. Here is an exemple of response id: resp_0791d82beabc1598006970da6152ac8196ad61a93647235d0b
I there any way to avoid this? I am currently using a fix that throws out excessives answers but it doesnât solve the problem of costâŠ
Best of luck
Youâll now note in function calling documentation.
Note for
gpt-4.1-nano-2025-04-14: This snapshot ofgpt-4.1-nanocan sometimes include multiple tools calls for the same tool if parallel tool calls are enabled. It is recommended to disable this feature when using this nano snapshot.
A âwe give upâ?
It only scratches the surface across all gpt-4.1 models of the issue of the AI not emitting a âstopâ, but continuing, producing repeats or new versions of structured outputs or function calls. A problem that goes well beyond using parallel tool calls (which can be disabled by API parameter), which is a poor inference of the fault.
def filter_assistant_messages(
run_result: RunResult,
) -> RunResult:
"""Filters out repeated assistant messages from the end of run result.
Args:
run_result (RunResult): The run result to filter.
Returns:
RunResult: The filtered RunResult.
"""
# identify the last assistant message from the end
last_assistant_index = None
for i in range(len(run_result.new_items) - 1, -1, -1):
if not isinstance(run_result.new_items[i], MessageOutputItem):
break
last_assistant_index = i
if last_assistant_index is None:
# gets here if the last message is not from the assistant or empty new_input_items
# if the last message is not from the assistant, we can keep as it is (or if is empty, which is not the expected as you want to run this filtering after an LLM call)
return run_result
elif last_assistant_index == len(run_result.new_items) - 1:
# if the last message is from the assistant, we can keep it as it is
return run_result
# else:
# we have at least two subsequent messages from the assistant at the end
# remove all items after the last assistant message
logger.warning(f"Filtering assistant messages: {run_result.new_items}")
run_result.new_items = run_result.new_items[: last_assistant_index + 1]
logger.warning(f"Filtered assistant messages: {run_result.new_items}")
return run_result
Here is the python filter i use if someone wants to try it. (This is inspired of @viniciusarruda answer on github)
Itâs actually incredible that Iâm still facing this issue - my prev post here - post id 1251365
Issue is the same - it relates to the responses api.
