Hey all - I’ve been doing some tests for the past 2 hours, and I’m pretty certain there’s a noticeable quality difference between the model outputs when stream
is true vs when it’s false.
When stream
is set to true, the complete response actually arrives faster (which is a little surprising to me, as I’d expect response from stream: false
to arrive faster since it doesn’t have to deal with the overhead of all the stream metadata being sent on each packet.
The more important point is that the model’s behavior seems to be different with different stream
settings. It seems to have better reasoning abilities with stream: false
.
Is inference being ran through a different pipeline on OpenAI’s side with different stream
settings? It would explain why stream: true
is faster & lower quality.