Speed comparison for Stream vs Non-stream in Chat Completin

Hello, recently I’m going to change my request from a non-stream to a stream request, similar to the approach used by Web ChatGPT.

How does the speed difference between a non-stream Vs stream? is there any significant difference?
actually did a small test and it looks like the difference seems minor, but I’d be so glad if you guys maybe have some extensive tests or benchmark and can share some of your results here

The different in speed depends on the language you are using and the amount of text that you expect the completion api to return. It makes sense to use stream if you want your app to have better UX, so the user don’t need to wait until the response is completed. Some even simple responses could take up to 10 seconds to wait especially in non-english languages

2 Likes