Difference in behavior with the `stream` flag in ChatCompletion API

Hey all - I’ve been doing some tests for the past 2 hours, and I’m pretty certain there’s a noticeable quality difference between the model outputs when stream is true vs when it’s false.

When stream is set to true, the complete response actually arrives faster (which is a little surprising to me, as I’d expect response from stream: false to arrive faster since it doesn’t have to deal with the overhead of all the stream metadata being sent on each packet.

The more important point is that the model’s behavior seems to be different with different stream settings. It seems to have better reasoning abilities with stream: false.

Is inference being ran through a different pipeline on OpenAI’s side with different stream settings? It would explain why stream: true is faster & lower quality.

Curious if you could post a few benches comparing overall speed diff with stream on vs off, where both return the same token count