How to prevent the API returning multiple outputs?

I am using the responses API with file search.

However, sometimes it creates 2, and sometimes even 15+ output messages.

Even in the Logs page on the dashboard I see that the assistant sent 15 similar messages in a row and it eats tokens like crazy.

I have tried setting parallel_tool_calls to false, yet it still creates multiple output messages (with the same message ID).

Is there a way to force it to create only one output?

There doesn’t seem to be any “output count limit” in the API documentation.

3 Likes

I am seeing that too in the Responses API non-streaming. I get multiple identical output items with very similar content. I think that there are several bugs in the Responses API. The Agents SDK works around these bugs by just taking the first output item that fits the desired return type.

1 Like

This is a model training fault. It is not emitting a stop sequence of the message format. Are you using “gpt-4.1-mini” perhaps?

You can add your own, using the stop parameter - oh, wait, no you can’t, because Responses doesn’t offer any token level control, and is thus unsuitable for a great many things, such as building production applications against OpenAI models. And it still wouldn’t work when inside a tool call, because you’d get parsing validation if stopping at and removing something like "}\n\n if using any API SDK methods or a strict function schema.

Yes, gpt-4.1-mini

A solution is to set max output tokens, if you know the expected output lenght.

It will write the first reponse, and the second response will be cut off due to reaching the limit. Then we just use the first response.

2 Likes

Thanks for taking the time to flag this, I’ve raised the issue with OpenAI.

2 Likes

Bo from the API team. Can you share a bit more details, or the exact request shape (or the request, response id) for us to take a look? Also how frequent do you see this happening? Is it only for gpt-4.1-mini?

2 Likes

I think I’m having the same issue

the body of my request looks like this

bodyStruct = {
                "model": "#model#",
                "input": "What was the last dollar value discussed in the message thread? Please return it for the value  in the format of a number with no dollar sign or commas. If no dollar amount was discussed then return 0.",
                "instructions": "You are a SMS collections agent. Use the attached message thread to answer questions the user asks.",
                "previous_response_id": "#MsgID#",
                "tools": [{
                    "type": "file_search",
                    "vector_store_ids": ["#responseData.vectorID#"]
                }],
                "include": ["file_search_call.results"],
                "text" : {
                    "format": {
                        "name":"get_last_discussed_dollar_value",
                        "type": "json_schema",
                        "schema": {
                            "type": "object",
                            "properties": {
                                "last_discussed_dollar_value": { "type": "number" }
                            },
                            "required": ["last_discussed_dollar_value"],
                            "additionalProperties": false
                        }
                    }
                }

I tried with gpt-4.1-nano and it seems to work correct only returning 2 results in the output field. The first items is the result/info about the tool search, and the second is the response the ai generated to the question.

gpt-4.1-mini-2025-04-14 seems to fairly consistently giving me more than one ai response. Although the exact number is inconstant often 2-19. That’s not counting the result of the file search which seems to always be the first item in the output. The multiple responses in my case are always the exact same response although that could be due to the data and phrasing of my question.

gpt-4.1-mini-2025-04-14 also takes Significantly longer. In my current workflow I’m making 3 requests sequentially and nano finishes all three in 7-9 sec while mini often takes 20-50 secs.

gpt-4.1-2025-04-14 also seems to work fine, returning only one response. it averages 7-13 sec per the same workflow i used on the others.

UPDATE: OP says message ID’s are all the same but in my case they are unique, although the messages ID’s at first glance may seem the same since they stat and end with the same characters. @Oliver_Michalik could you double check if the ID’s you’re seeing are truly the same?

1 Like

The model seems to be the common fault.

Its behavior can be well defined:

  • Can’t follow the ChatML format.
  • Can’t consistently send to tool recipients

Closing messages and tool calls with the correct token and emitting to functions both need post-training. Along with not misusing multi_tool_use and fabricating functions when thus inside.

It is a strong indicator that a new version of this model must be developed.

The forum is full of anecdote, and I can entail any complaint with “I can guess what model you are using”.

The responses endpoint furthers this by not stopping on several possible alternate stop sequences, such as another (token)assistant(token). That’s <200006>assistant<200008> for you…

API users could provide self-service if the stop parameter and logit_bias were presented, and worked at any sampling temperature and top_p. And then special tokens could also be addressed with logit_bias.

Thanks for flagging this. We can confirm the Responses-API duplication you’re seeing with GPT-4.1 and are actively investigating. We’ll update this thread as soon as we have a fix.

2 Likes

Quick update: we’ve rolled out a fix. If you still see multiple outputs, let us know in this thread and we’ll investigate right away.

4 Likes

This is still an easily-reproducible loop-fest of a model.

About 10 function calls in: resp_681949af5b2481918a3c98a03eda7c210d347f8b65e1dc0f

Playground preset, ready to call functions forever: https://platform.openai.com/playground/p/hGd3yvQdSzB6OpVFNG3bM9Hl?mode=chat

If allowing a specific function in Responses to be used with the tool parameter, that should be released after the first call in the Prompts Playground, just as one must do in developing an application.

Not directly gpt-4.1-mini in that case, though: It’s actually o4-mini that has a big problem with such a function when tool is set to “auto” - sometimes delivering a bill of thousands of tokens if the user asks “call these in parallel”, and the internal iterations are unseen.


The gpt-4.1-mini model has been “evaluate a while back, disregard” for me. However I can prod it more until I get poor results, if someone’s listening. Like this report on two parallel tool calls, where the AI model confabulates the returns, on the simplest context seen before and tools: “auto”.

How could the AI come to the conclusion that two six sided die rolls could add to 20, even if the function returning specific values was unclear? (The ID mechanism for matching tool call to tool return has no internal equivalent for AI consumption; they are apparently just ordered, demanding quality attention heads).

1 Like

@OpenAI_Support

I’m currently getting this issue with GPT4.1 using the Response API.

The output field will have multiple objects with the message type. The text will be nearly identical, but will be combined in the output_text field.

3 Likes

Your preset worked for me (after removing context messages first). I notice that in your screenshot you have tool_choice set. My understanding is that this parameter will force the model to call your tool choice in a loop until the death of the universe.

The model could also benefit from improved prompt engineering if you’re still running into issues after making sure tool_choice is unset, which it already is that way in your preset.

1 Like

Hello _j, We are unable to reproduce the issue. Could you confirm if you are still seeing the issue?
I only see one output as expected.

1 Like

Yep, gpt-4.1-mini and o4-mini seem to be behaving well for my further short-context attempts, thanks for following up.

Except for one with function(strict:false) and json_object as that Playground preset now:

No request ID to supply with that failure of the API.

Hello! Is this related to multiple output issue? If not could we please create a new thread on this issue?

1 Like

Thanks again for the previous fix. However, I’m still seeing multiple outputs being generated from a single API call, especially when dealing with long input contexts (~19K tokens).

No tool or file search: This was a plain chat completion with long context, no function calls or tool use.
Session ID: resp_68219e72837881919a97b567dc4ddd8f0eb9069adcc6eaa2

1 Like

Hi dear openai team the duplication problem in gpt 4.1 and 4.1 mini is still there please consider solving it again the problem is still there

Please take a critical look it is not solved

Hello! It appears to be a model behavior issue, and I’ve escalated it to the engineering team. We’ll investigate and implement a fix as soon as possible.

2 Likes