I have a working prototype using gpt-4-1106-preview and the completion api.
One thing that is critical for our use is streamed output as we need to put it out through TTS to the user as soon as possible.
Our example generates a fair amount of text. Without streamed output the result arrives in 20-30 seconds which is not acceptable for an interactive voice driven application.
Using streamed results the first bit of text arrives within 3 seconds and the remaining out paces the TTS so the user does not notice the delay.
I have looked through the Assistant api and I can’t see any way to do the same.