Hi, I have been trying to use GPT 4 Chat Completion API to stream some response.
My application backend is in FastAPI, and I am using a generator function which yields tmp.choices[0].delta.content as we loop over openai.AsyncStream object.
Now the issue is that the streaming randomly stops, how do I fix it?
I am using chat.completion.with_raw_response.create() with stream set to True and all other params like models
This is my yield function which I wrap it within StreamingResponse of FastAPI
try:
reply = ""
async for x in response:
tmp = x.choices[0].delta.content
if tmp is None:
continue
reply = reply + tmp
yield tmp
except Exception as ex:
raise HTTPException(
status_code=500, detail="Something went wrong. Please try again."
)
1 Like
Any news on that?
I currently have the same issue using the Microsoft Sample Chat WebApp with the newest API Preview Version together with the AzureCognitiveSearch / Azure AI Search.
Everytime i use the stream option and i would get a large response the stream will randomly stop at some time.
As soon i disable the stream of cause i need to wait but i will get the full response.
To reproduce you can check out the sample app and setup the environment quickly
I can’t attach links so lets make it the oldschool way…
Serach on GitHub for:
sample-app-aoai-chatGPT
Maybe you can set max_tokens=4096 chat.completion.with_raw_response.create() in parameters. I am not sure why but it has reduced stream stopping incidents by a little.
That’s actually what i already have
Didn’t worked for me
Then I am not sure how to resolve this issue. Maybe it is just network connection acting funny. If that’s the case, then your only option is to choose the nearest cluster and avoid generating big answers via streaming response.
Or just try another LLM like gemini which generates more tokens per second as compared to GPT