Thanks a lot, looking forward to the update
I am seeing the same problem after function call with gpt-4o-2024-08-06
I am also experiencing this. I can provide a Trace ID if you need. This had no tools called. It was a 15,000 token input.
The trace shows 45,000 tokens
Model: gpt-4.1-mini-2025-04-14
It is showing 3 responses. You should be able to find other instances of this by just looking for responses that have more than one assistant response as outputs (I would assume).
Hello @sardanian @ouldouz @michael_volca could someone post a raw input and raw output of the model? This will help us investigate the issue.
You do know that 15000 tokens is closer to 60000 characters, twice this forumās āpostā limitā¦and such a length would likely involve personal or proprietary data.
A better share method to observe input and output (besides simply reporting the response ID to someone at OpenAI able to access that), is to re-create the āsituationā on the playground, save it as a preset with a shared link.
Then note the rate of occurrence if repeatedly running that input, as by the end of a 100 token response length, the path of prediction may have had 200000 ** 100 variations, and without an ability to complete on a response output (such as offered by Anthropic), it may only be a constant looming danger on any input to the misbehaving model.
Hello! We don't need the raw output anymore. We are working on identifying the issue with the information we have currently. We will keep you updated. Thanks for your co-operation on this issue.
Hi OpenAI team,
Iām encountering a repeatable issue with GPT-4o when prompting the model to respond in a structured JSON format ā specifically when I ask it to return more than one field inside a JSON object.
This problem rarely occurs when the response is plain text (when i donāt ask it to include additional fields into content object). However, when I require the model to include additional fields in the content object structure (e.g., choices: string), the response:
- Often duplicates output (output_item.added and output_item.done is duplicated)
- Sometimes repeats the sending same output_item event continuously
- In worst cases, enters what appears to be an infinite loop, repeating until reaching the rate limit
Hi! Adding my stone to the pile, we have the multiple output problem with gpt-4.1-mini. @OpenAI_Support while you are investigating the problem, can you please suggest an alternative model? Itās not clear to me if the problem is specific to āminiā or common to all 4.1 family. Thanks!
Hello! Thanks for surfacing this! At the moment, there isnāt a server-side fix we can apply for this behavior.
The best workaround is prompt-tuning: add an instruction such as
āPlease return one complete answer and then stop.ā
Users who tighten their prompt in this way generally avoid the duplicate-message behavior. Weāll keep an eye on any future model updates and share news if that changes.
The issues does not seem to occur (as much) with the GPT 4.1 model, but we are also experiencing problems with multiple assistant responses back to back, even when instructing to return only one complete answer, with the 4.1-mini and 4.1-nano models.
This seems to occur with instructions that are very long. Shorter instructions seem to work better with the mini and nano models. It seems that the token amount because of the large instructions is too much to handle on those models.
Also I think the Chat Completion API has the property ānā to return a ānā number of generated responses back to the client. Why can this not be implemented for the Responses API?
This is a bug in the model, and we cannot intervene if the solution you propose doesnāt work. However, even though itās a bug, responses like this still cost us a lot of money. Who will compensate us for that? Especially when using previous_response_id
, a repeated loop causes the input token count to spike ā and in the end: Boom, your account gets charged $100.