I am seeing that too in the Responses API non-streaming. I get multiple identical output items with very similar content. I think that there are several bugs in the Responses API. The Agents SDK works around these bugs by just taking the first output item that fits the desired return type.
This is a model training fault. It is not emitting a stop sequence of the message format. Are you using “gpt-4.1-mini” perhaps?
You can add your own, using the stop parameter - oh, wait, no you can’t, because Responses doesn’t offer any token level control, and is thus unsuitable for a great many things, such as building production applications against OpenAI models. And it still wouldn’t work when inside a tool call, because you’d get parsing validation if stopping at and removing something like "}\n\n if using any API SDK methods or a strict function schema.
Bo from the API team. Can you share a bit more details, or the exact request shape (or the request, response id) for us to take a look? Also how frequent do you see this happening? Is it only for gpt-4.1-mini?
bodyStruct = {
"model": "#model#",
"input": "What was the last dollar value discussed in the message thread? Please return it for the value in the format of a number with no dollar sign or commas. If no dollar amount was discussed then return 0.",
"instructions": "You are a SMS collections agent. Use the attached message thread to answer questions the user asks.",
"previous_response_id": "#MsgID#",
"tools": [{
"type": "file_search",
"vector_store_ids": ["#responseData.vectorID#"]
}],
"include": ["file_search_call.results"],
"text" : {
"format": {
"name":"get_last_discussed_dollar_value",
"type": "json_schema",
"schema": {
"type": "object",
"properties": {
"last_discussed_dollar_value": { "type": "number" }
},
"required": ["last_discussed_dollar_value"],
"additionalProperties": false
}
}
}
I tried with gpt-4.1-nano and it seems to work correct only returning 2 results in the output field. The first items is the result/info about the tool search, and the second is the response the ai generated to the question.
gpt-4.1-mini-2025-04-14 seems to fairly consistently giving me more than one ai response. Although the exact number is inconstant often 2-19. That’s not counting the result of the file search which seems to always be the first item in the output. The multiple responses in my case are always the exact same response although that could be due to the data and phrasing of my question.
gpt-4.1-mini-2025-04-14 also takes Significantly longer. In my current workflow I’m making 3 requests sequentially and nano finishes all three in 7-9 sec while mini often takes 20-50 secs.
gpt-4.1-2025-04-14 also seems to work fine, returning only one response. it averages 7-13 sec per the same workflow i used on the others.
UPDATE: OP says message ID’s are all the same but in my case they are unique, although the messages ID’s at first glance may seem the same since they stat and end with the same characters. @Oliver_Michalik could you double check if the ID’s you’re seeing are truly the same?
Closing messages and tool calls with the correct token and emitting to functions both need post-training. Along with not misusing multi_tool_use and fabricating functions when thus inside.
It is a strong indicator that a new version of this model must be developed.
The forum is full of anecdote, and I can entail any complaint with “I can guess what model you are using”.
The responses endpoint furthers this by not stopping on several possible alternate stop sequences, such as another (token)assistant(token). That’s <200006>assistant<200008> for you…
API users could provide self-service if the stop parameter and logit_bias were presented, and worked at any sampling temperature and top_p. And then special tokens could also be addressed with logit_bias.
Thanks for flagging this. We can confirm the Responses-API duplication you’re seeing with GPT-4.1 and are actively investigating. We’ll update this thread as soon as we have a fix.
If allowing a specific function in Responses to be used with the tool parameter, that should be released after the first call in the Prompts Playground, just as one must do in developing an application.
Not directly gpt-4.1-mini in that case, though: It’s actually o4-mini that has a big problem with such a function when tool is set to “auto” - sometimes delivering a bill of thousands of tokens if the user asks “call these in parallel”, and the internal iterations are unseen.
The gpt-4.1-mini model has been “evaluate a while back, disregard” for me. However I can prod it more until I get poor results, if someone’s listening. Like this report on two parallel tool calls, where the AI model confabulates the returns, on the simplest context seen before and tools: “auto”.
How could the AI come to the conclusion that two six sided die rolls could add to 20, even if the function returning specific values was unclear? (The ID mechanism for matching tool call to tool return has no internal equivalent for AI consumption; they are apparently just ordered, demanding quality attention heads).
Your preset worked for me (after removing context messages first). I notice that in your screenshot you have tool_choice set. My understanding is that this parameter will force the model to call your tool choice in a loop until the death of the universe.
The model could also benefit from improved prompt engineering if you’re still running into issues after making sure tool_choice is unset, which it already is that way in your preset.
Thanks again for the previous fix. However, I’m still seeing multiple outputs being generated from a single API call, especially when dealing with long input contexts (~19K tokens).
No tool or file search: This was a plain chat completion with long context, no function calls or tool use.
Session ID: resp_68219e72837881919a97b567dc4ddd8f0eb9069adcc6eaa2
Hello! It appears to be a model behavior issue, and I’ve escalated it to the engineering team. We’ll investigate and implement a fix as soon as possible.