Difference in behavior with the `stream` flag in ChatCompletion API

dzhng · April 15, 2023, 8:36am

Hey all - I’ve been doing some tests for the past 2 hours, and I’m pretty certain there’s a noticeable quality difference between the model outputs when stream is true vs when it’s false.

When stream is set to true, the complete response actually arrives faster (which is a little surprising to me, as I’d expect response from stream: false to arrive faster since it doesn’t have to deal with the overhead of all the stream metadata being sent on each packet.

The more important point is that the model’s behavior seems to be different with different stream settings. It seems to have better reasoning abilities with stream: false.

Is inference being ran through a different pipeline on OpenAI’s side with different stream settings? It would explain why stream: true is faster & lower quality.

sam.saffron · April 15, 2023, 12:11pm

Curious if you could post a few benches comparing overall speed diff with stream on vs off, where both return the same token count

Topic		Replies	Views
Speed comparison for Stream vs Non-stream in Chat Completin API chatgpt , api	1	3365	June 17, 2023
Discrepancy in Response Speed between GPT-3.5-turbo API and ChatGPT UI API gpt-35-turbo , chatgpt , api	4	2959	December 24, 2023
GPT-4 Streaming Output Radically Different than Static Output API	4	1594	October 9, 2023
504 Gateway Timeout But Response Logged Bugs	6	242	May 9, 2025
Different returns when usage of stream API chatgpt , api	2	2287	August 14, 2023

Difference in behavior with the `stream` flag in ChatCompletion API

Related topics