Streaming Response Keeps on Breaking

tushargahlaut74 · June 17, 2024, 6:53am

Hi, I have been trying to use GPT 4 Chat Completion API to stream some response.

My application backend is in FastAPI, and I am using a generator function which yields tmp.choices[0].delta.content as we loop over openai.AsyncStream object.

Now the issue is that the streaming randomly stops, how do I fix it?

I am using chat.completion.with_raw_response.create() with stream set to True and all other params like models

This is my yield function which I wrap it within StreamingResponse of FastAPI

    try:
        reply = ""
        async for x in response:
            tmp = x.choices[0].delta.content
            if tmp is None:
                continue
            reply = reply + tmp
            yield tmp
    except Exception as ex:
        raise HTTPException(
            status_code=500, detail="Something went wrong. Please try again."
        )

opitzy · July 2, 2024, 6:01am

Any news on that?
I currently have the same issue using the Microsoft Sample Chat WebApp with the newest API Preview Version together with the AzureCognitiveSearch / Azure AI Search.

Everytime i use the stream option and i would get a large response the stream will randomly stop at some time.

As soon i disable the stream of cause i need to wait but i will get the full response.

To reproduce you can check out the sample app and setup the environment quickly

I can’t attach links so lets make it the oldschool way…
Serach on GitHub for:
sample-app-aoai-chatGPT

tushargahlaut74 · July 2, 2024, 9:33am

Maybe you can set max_tokens=4096 chat.completion.with_raw_response.create() in parameters. I am not sure why but it has reduced stream stopping incidents by a little.

opitzy · July 2, 2024, 4:30pm

That’s actually what i already have
Didn’t worked for me

tushargahlaut74 · July 3, 2024, 3:55am

Then I am not sure how to resolve this issue. Maybe it is just network connection acting funny. If that’s the case, then your only option is to choose the nearest cluster and avoid generating big answers via streaming response.

Or just try another LLM like gemini which generates more tokens per second as compared to GPT

Topic		Replies	Views
Getting ChunkedEncodingError in every stream request to GPT-4 API	8	7007	December 24, 2023
How can I determine whether GPT is experiencing streaming failure or if the streaming has already been completed in a Stream situation? API chatgpt	4	1711	December 24, 2023
ChatGPT API Very Slow at generating Responses API gpt-4 , api	8	5021	December 25, 2023
Error with OpenAI Python Library when Using API with stream=True API chatgpt , api	4	2395	February 28, 2024
Error with gpt api incoming streaming API api	1	475	August 22, 2024

Streaming Response Keeps on Breaking

Related topics