I’m seeing issues with streaming /chat/completions. I’m not using the Python package, we’ve written it in Go and are seeing things working great most of the time. We’re confident it’s not an issue with our setup. Sometimes, however, we seem to be getting nothing streamed back.
The request is legitimate and we get a 200 response back. The first 2 or 3 tokens will be sent, and then nothing. We set an 80 second timeout on the request, and what we’re seeing is a ~2 second delay till first token, then another token, perhaps a third token, and then nothing.
Our leading theory is that we being rate limited in the middle of the stream. I’d think not since we got a 200 initially, but that’s our only thread right now.
I think I’d be tempted to spin up a piece of python test code that replicates what you are requesting using the prebuilt openai lib, if you get the same issues then you know it’s down to a busy server or network errors that need to be handled, if the issue goes away using the boilerplate library, you know that there is some unhandled state in your code.
This topic is really good for me as I am facing a similar issue with GPT-4 on my company account. Sometimes, the streaming starts to print only one or two words, and then it completely gets stuck until the three-minute timeout is triggered by our backend server. I am not using Python or Go. Is there anyone else who has experienced this? Has anyone found a solution?