Streaming Response Keeps on Breaking

Hi, I have been trying to use GPT 4 Chat Completion API to stream some response.

My application backend is in FastAPI, and I am using a generator function which yields tmp.choices[0].delta.content as we loop over openai.AsyncStream object.

Now the issue is that the streaming randomly stops, how do I fix it?

I am using chat.completion.with_raw_response.create() with stream set to True and all other params like models

This is my yield function which I wrap it within StreamingResponse of FastAPI

    try:
        reply = ""
        async for x in response:
            tmp = x.choices[0].delta.content
            if tmp is None:
                continue
            reply = reply + tmp
            yield tmp
    except Exception as ex:
        raise HTTPException(
            status_code=500, detail="Something went wrong. Please try again."
        )
1 Like

Any news on that?
I currently have the same issue using the Microsoft Sample Chat WebApp with the newest API Preview Version together with the AzureCognitiveSearch / Azure AI Search.

Everytime i use the stream option and i would get a large response the stream will randomly stop at some time.

As soon i disable the stream of cause i need to wait but i will get the full response.

To reproduce you can check out the sample app and setup the environment quickly

I can’t attach links so lets make it the oldschool way…
Serach on GitHub for:
sample-app-aoai-chatGPT

Maybe you can set max_tokens=4096 chat.completion.with_raw_response.create() in parameters. I am not sure why but it has reduced stream stopping incidents by a little.

That’s actually what i already have :frowning:
Didn’t worked for me

Then I am not sure how to resolve this issue. Maybe it is just network connection acting funny. If that’s the case, then your only option is to choose the nearest cluster and avoid generating big answers via streaming response.

Or just try another LLM like gemini which generates more tokens per second as compared to GPT