Responses at chat.openai.com are streamed through network word by word. It’s obvious that response is already complete somewhere on the server side. Slow printing of response is just so time consuming.
Is there a way how to switch to normal “chatting” mode with instant messages?
I guess it will bring some better performance as well.
My understanding is that the response is not already complete on the server side, but that the nature of the algorithm is such that it’s generating those response tokens in real time.
TL;DR: It is very likely already complete, can’t turn off streaming.
—-
Maybe it would makes sense if response will be some story or something that can be interatively extent but sentences at least must be evaluated in one piece. It cannot just generate words appending to existing text and the whole response is so smart.
I often ask for code examples and snippets have to be as one meaningfull program from top to bottom. It cannot be generated by words but it is still streamed as typing.
I looked at the source of page to figure out how does it work and found a steam of messages that differ only in last appended word. Each time a whole new response text is received, just appending a single word to the previous one, page reformat message using markdown syntax. The result is that content slightly jumping around due to unexpedted or incomplete formats.
—-
I appeaciate the animation, I like it, it brings realistic, futuristic feeling (real human chatting is instant anyway). Also rich responses are so long and informative even for simple question.
The complain is during long discussion. I usually enter message and leave desk waiting for response. It cannot be disabled, it is not local mechanism, I cannot override behavior, it is streamed from server.
when you get response from API, the entire long response comes after 3~4 seconds only. I think chatgpt does not let us turn off this ‘typing animation’ to slow down us.