GPT 5.2 streaming responses API sending two or more messages in same response stream?

Can someone clarify the intended behavior when GPT 5.2 generates multiple messages in a single response stream?

These often seem to be artifacts of an internal iterative drafting process. Earlier messages appear to be drafts of later messages. It is, however, not always the case.

Sometimes I receive messages that are meaningful. Sometimes they are garbled.

These are 6 message fragment adds that arrived in one relatively coherent response stream:

{“type”:“response.output_item.added”,“sequence_number”:97,“output_index”:1,“item”:{“id”:“msg_0be97e460258cdf801693bb86bb9388197b6b1a8bd0877bd20”,“type”:“message”,“status”:“in_progress”,“content”:,“role”:“assistant”}}
{“type”:“response.output_item.added”,“sequence_number”:233,“output_index”:2,“item”:{“id”:“msg_0be97e460258cdf801693bb86be05081978aae69f5cfea6a0c”,“type”:“message”,“status”:“in_progress”,“content”:,“role”:“assistant”}}
{“type”:“response.output_item.added”,“sequence_number”:243,“output_index”:4,“item”:{“id”:“msg_0be97e460258cdf801693bb86cc6f481978124e9be569fa224”,“type”:“message”,“status”:“in_progress”,“content”:,“role”:“assistant”}}
{“type”:“response.output_item.added”,“sequence_number”:290,“output_index”:5,“item”:{“id”:“msg_0be97e460258cdf801693bb86cd6e8819789ed8e9efd3bd755”,“type”:“message”,“status”:“in_progress”,“content”:,“role”:“assistant”}}
{“type”:“response.output_item.added”,“sequence_number”:384,“output_index”:7,“item”:{“id”:“msg_0be97e460258cdf801693bb86eb858819797068d1b216189f8”,“type”:“message”,“status”:“in_progress”,“content”:,“role”:“assistant”}}
{“type”:“response.output_item.added”,“sequence_number”:444,“output_index”:9,“item”:{“id”:“msg_0be97e460258cdf801693bb86ef3108197a00d7ff59bdfdbcc”,“type”:“message”,“status”:“in_progress”,“content”:,“role”:“assistant”}}

1 Like

Hi @tom_osullivan!

Could you please check whether you are being charged for the output of the unexpected messages? I am raising this behavior with the team, and that would be an important detail to include.

Hi @vb!

How would you like me to pick up that information?

Our application reads the “usage” object from the response json returned in the response.completed event, and the “output_tokens” count does reflect the total across these batches of erratic messages.

If you’d like me to check elsewhere please let me know; I’m happy to help.

1 Like

Yes, that would be very helpful.

What I meant is whether you could check if you are being charged for each message returned by the model, billed as output tokens.

If you did not capture the cost object, the next best option is to review the usage dashboard.

I created a new project and new API key for diagnostic purposes, then issued a number of requests until the erratic behavior triggered.

2/3 requests with the same prompt triggered multiple messages in the assistant response. One of those was innocuous, the other not.

I set the store flag to true, so the responses are logged. If you have visibility into those I can share response ids.

Since the cost api isn’t granular at the “response id” level I can’t tell you exactly what is being charged, but it does appear that I am being charged for all output tokens in these responses.

The token counts in the logs and costs reported in the dashboard are consistent.

1 Like

Thank you again for looking into this.

Sharing the request IDs can be helpful when reviewing the details.
That said, I definitely want to escalate this if users are being charged for erroneous responses.

I’ll try to repro this as well in order to get more clarity.

1 Like

VB - DM sent with response id and screen cap from log.

2 Likes

FYI we are now seeing the same erratic response behavior under gpt-5.1.

You seem to be describing two separate symptoms, that have now migrated cross-model (but perhaps running the same tasks or app setup):

  • the AI doesn’t properly terminate and stop at the end of a final response with this being caught by the internal stop sequence token detection, but instead opens another assistant final response and then repeats it again like it wasn’t received (writing whatever the AI might write when it is again placed in such a context of a “restart” without an intervening user message);
  • the AI emits a bad sequence itself that is not a mid-sentence stop, but is enough to trigger the API backend into detecting another “role” or “output” beginning.

See if I’ve characterized the symptoms you report correctly, and if you think the underlying description of unseen generation aligns with observation.

The case of the events presented above (but not a complete stream) shows that there are different lengths of sequences before a new output list item is begun. One line of inquiry to inform appropriate OpenAI investigation: does this seem to happen at positions of semantic significance, such as at the start of new paragraphs, or is it completely random-looking?

Then: is it an artifact that can be completely eliminated if "stream":false, or does that also generate separate items in the response “output” array (not "output_text" of the SDK, which is a collector)?

(Diagnosis would be super-easy if we were treated as qualified, and could receive a parallel stream of underlying unfiltered sampled token integers…to 201,088)