Rate limits in middle of stream

I’m seeing issues with streaming /chat/completions. I’m not using the Python package, we’ve written it in Go and are seeing things working great most of the time. We’re confident it’s not an issue with our setup. Sometimes, however, we seem to be getting nothing streamed back.

The request is legitimate and we get a 200 response back. The first 2 or 3 tokens will be sent, and then nothing. We set an 80 second timeout on the request, and what we’re seeing is a ~2 second delay till first token, then another token, perhaps a third token, and then nothing.

Our leading theory is that we being rate limited in the middle of the stream. I’d think not since we got a 200 initially, but that’s our only thread right now.

1 Like

I think I’d be tempted to spin up a piece of python test code that replicates what you are requesting using the prebuilt openai lib, if you get the same issues then you know it’s down to a busy server or network errors that need to be handled, if the issue goes away using the boilerplate library, you know that there is some unhandled state in your code.

1 Like

This topic is really good for me as I am facing a similar issue with GPT-4 on my company account. Sometimes, the streaming starts to print only one or two words, and then it completely gets stuck until the three-minute timeout is triggered by our backend server. I am not using Python or Go. Is there anyone else who has experienced this? Has anyone found a solution?

2 Likes

I really don’t think it’s an issue with how we’re handling it, but we can try that to be sure.

I suspect it’s just everyday data transport issues and perhaps some server side problems, but it’s wise to build up some sanity checks for a baseline.

Just in case, if someone is facing the same issue.

Currently, I have temporarily or maybe permanently added two options with CURL.

CURLOPT_NOPROGRESS and CURLOPT_PROGRESSFUNCTION options are added to check if the streaming is in progress. If it exceeds more than 5 seconds, I kill that connection.

1 Like

Nice, and have you found that works reliably for you? I think I’d be tempted to up that 5 seconds to something more internet realistic, 30 seconds?

Yeah, it is now almost well stabilized. And nah, 5 seconds is enough. That is just for one word. One word taking 30 seconds is not acceptable.

CURLOPT_TIMEOUT - 180 seconds
CURLOPT_CONNECTTIMEOUT - 5 seconds
CURLOPT_NOPROGRESS / CURLOPT_PROGRESSFUNCTION - 5 seconds

Easy peasy :grinning:

1 Like