Streaming response break if openai take long time

Hi, i am using streaming openai api. but when answer is long, the response break.
Here is my code

def call_openai_streaming(
    client, gpt_model_name, messages, temperature, response_format, final_response
):
    openai_stream = client.chat.completions.create(
        model=gpt_model_name,
        messages=messages,
        stream=True,
        temperature=temperature,
        response_format=response_format,
    )

    for chunk in openai_stream:
        if chunk.choices[0].delta.content is not None:
            final_response += chunk.choices[0].delta.content
            yield chunk.choices[0].delta.content
def qa_conversation(config, conversation_log, request):
    messages = [
        {"role": "system", "content": ....},
        {"role": "user", "content": ....},
    ]
    @traceable(run_type="chain", name=f"qa_conversation")
    def llm_call(messages):
        final_response = ""
        client = OpenAI(api_key=config["OPENAI_KEY"])
        yield from call_openai_streaming(
            client, "gpt-4", messages, 0.0, None, final_response
        )
        
    return llm_call(messages)

fastapi

@app.post("/generate_assistant_response")
def assistant_response(request: Request):
    headers = {
        "Content-Type": "text/plain",
        "Transfer-Encoding": "chunked",
        "Connection": "Transfer-Encoding",
    }
    
    conversation_log = request.conversation_log
    intent_type = request.intent.get("type", None)

    if intent_type == "qa":
        generator = qa_conversation(config, request.conversation_log, request)
        return StreamingResponse(generator, media_type="text/plain", headers=headers)

i am logging error via langmsith, and in that case langsmith show this error

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/langsmith/run_helpers.py", line 443, in generator_wrapper
    yield item
GeneratorExit

I am open to all sort of suggestion which can help to debug this issue in a better way. I only get error when answer is long, for short answer, there is no issue

Since we can’t see what “client” is being passed, is there a chance you are setting a shorter timeout on it than the time required to respond?

this is client

client = OpenAI(api_key=config[“OPENAI_KEY”])

The openai library can get the OPENAI_API_KEY value from your OS environment with no additional code.

Here’s increasing the timeouts higher just to see if that is an issue:

client = OpenAI(timeout=httpx.Timeout(300.0, read=20.0, write=20.0, connect=10.0))

1 Like
client = OpenAI(api_key=config["OPENAI_KEY"])

This format suggests the use of the openai module. Indeed, the openai module can explicitly accept a specified API key, but it can be omitted if it is set via an environment variable.
Furthermore,

call_openai_streaming(
    client, gpt_model_name, messages, temperature, response_format, final_response
):

The above signature caught my attention.

The chat.completions.create method within the openai module does not accept a final_response parameter, which suggests that the instance may not be from the openai module but rather from another module such as lagchain.

In the given example, neither the import statements nor the caller of the call_openai_streaming function are explicitly shown.

While LangSmith and W&B are extremely useful for observing the behavior of LLMs, if you seek advice from other members, it is necessary to avoid ambiguous notations and to clearly state dependencies, providing as much reproducible code as possible to replicate the phenomenon.

This may sound like an explanation of general etiquette within an OSS community, but it is important when seeking advice from other members within the community.

While this may not be a direct answer to the problem, I hope it can be of some help.

I have added more details. can you look into that