Thanks a lot, looking forward to the update ![]()
I am seeing the same problem after function call with gpt-4o-2024-08-06
I am also experiencing this. I can provide a Trace ID if you need. This had no tools called. It was a 15,000 token input.
The trace shows 45,000 tokens
Model: gpt-4.1-mini-2025-04-14
It is showing 3 responses. You should be able to find other instances of this by just looking for responses that have more than one assistant response as outputs (I would assume).
Hello @sardanian @ouldouz @michael_volca could someone post a raw input and raw output of the model? This will help us investigate the issue.
You do know that 15000 tokens is closer to 60000 characters, twice this forumâs âpostâ limitâŠand such a length would likely involve personal or proprietary data.
A better share method to observe input and output (besides simply reporting the response ID to someone at OpenAI able to access that), is to re-create the âsituationâ on the playground, save it as a preset with a shared link.
Then note the rate of occurrence if repeatedly running that input, as by the end of a 100 token response length, the path of prediction may have had 200000 ** 100 variations, and without an ability to complete on a response output (such as offered by Anthropic), it may only be a constant looming danger on any input to the misbehaving model.
Hello! We don't need the raw output anymore. We are working on identifying the issue with the information we have currently. We will keep you updated. Thanks for your co-operation on this issue.
Hi OpenAI team,
Iâm encountering a repeatable issue with GPT-4o when prompting the model to respond in a structured JSON format â specifically when I ask it to return more than one field inside a JSON object.
This problem rarely occurs when the response is plain text (when i donât ask it to include additional fields into content object). However, when I require the model to include additional fields in the content object structure (e.g., choices: string), the response:
- Often duplicates output (output_item.added and output_item.done is duplicated)
- Sometimes repeats the sending same output_item event continuously
- In worst cases, enters what appears to be an infinite loop, repeating until reaching the rate limit
Hi! Adding my stone to the pile, we have the multiple output problem with gpt-4.1-mini. @OpenAI_Support while you are investigating the problem, can you please suggest an alternative model? Itâs not clear to me if the problem is specific to âminiâ or common to all 4.1 family. Thanks!
Hello! Thanks for surfacing this! At the moment, there isnât a server-side fix we can apply for this behavior.
The best workaround is prompt-tuning: add an instruction such as
âPlease return one complete answer and then stop.â
Users who tighten their prompt in this way generally avoid the duplicate-message behavior. Weâll keep an eye on any future model updates and share news if that changes.
The issues does not seem to occur (as much) with the GPT 4.1 model, but we are also experiencing problems with multiple assistant responses back to back, even when instructing to return only one complete answer, with the 4.1-mini and 4.1-nano models.
This seems to occur with instructions that are very long. Shorter instructions seem to work better with the mini and nano models. It seems that the token amount because of the large instructions is too much to handle on those models.
Also I think the Chat Completion API has the property ânâ to return a ânâ number of generated responses back to the client. Why can this not be implemented for the Responses API?
This is a bug in the model, and we cannot intervene if the solution you propose doesnât work. However, even though itâs a bug, responses like this still cost us a lot of money. Who will compensate us for that? Especially when using previous_response_id, a repeated loop causes the input token count to spike â and in the end: Boom, your account gets charged $100.
This remains a prevalent and critical issue in the Responses API. Here are my findings:
- Adding instructions to respond with only one output doesnât help. The system returns a single output, but then duplicates the response message approximately 20 times until it hits a limit, ending around 70k tokens.
- My system instructions request an XML output with multiple fields. When the XML format is not present, the issue appears to occur less frequently.
- I have a specific set of instructions that consistently reproduces the issue when the user responds with a single-digit number. All prior responses work perfectly. Then, when the agent asks for a quantity and the user replies with something like â4â, the problem occurs. Oddly enough, if the user responds with âfourâ, the issue happens less frequently.
This is causing significant delays in response times and increased costs due to excessive and erroneous token usage.
Whatâs the status of getting this fixed?
For context:
- Iâm using the responses API running on gpt-4.1
- An initial set of system instructions is set on the first message then subsequent messages pass in the previous response id to maintain context
- Non streaming
- No tool calling
I had a similar problem with gpt-4.1-mini producing about 3 to 15 similar messages.
In my case, the problem disappeared by itself after some time, I did not do anything special, and donât get these duplicate messages for long time already.
Has this issue been resolved in the new requests API as at 1st July, 2025? Im currently migrating from Assistants API to the latest requests in nodejs, and continuously receive mĂșltiple output results, up to 20, and this is causing a long delay in firing the response.done streaming event, and also inconsistant message output results. Sometimes even a new message is added on to the last (i.e 20th) item of the messages output.
This obviously affects the user experience, with a long delay in receiving a reply, in addition to erroneous messages.
Unfortunately, that doesnât seem to be the case - I still see this issue.
@OpenAI_Support please feel free to look at resp_6864557252f081a1be91daf8407615080f6ebe68ea32079b
I also upgraded to the latest openai npm module.
I did notice that the gpt4.1 model produces less repeated outputs that the nano and mini models. These models are really unusable in my use case, as the delay, every so often, is so long that it will lose the interest of the customer. However the gpt4.1 has a much higher cost, obviously
It seems to happen more after a few function calls rather than at the beginning or the interaction.
Interestingly, with the Assistants API this did not happen at all. Maybe I upgraded too soon ![]()
What I truly cannot comprehend is how this could possibly be an issue with the model itself, when the very same model functions flawlessly with the Chat Completions API. Arenât Chat Completions and Responses merely different means of accessing the same underlying model? How could the use of one API over the other influence the modelâs fundamental behavior? Above all, it remains unclear just how significant this problem is. Is there an estimated timeline for its resolution? I am genuinely beginning to consider alternative providers, as I am far from satisfied with OpenAIâs prioritization in addressing customer concerns.
I have the same issue with multiple outputs under gpt-4o-mini and Response API.
It looks like the issue is not model related but rather API related
Weâre running into the same issue, looping 20 times before the stream stops. Using 4.1 mini. This is problematic because the number of tokens used is 20 times more for every call. How about I take the first output and cancel the stream, does it work?
Hi there, I have encountered the same issue on GPT4.1-mini when using Assistants, Threads and Runs, so itâs certainly not just a ResponseAPI issue. The same problem is never occurring when using GPT4o-mini.
