Streaming response break if openai take long time

talha_anwar · January 11, 2024, 9:48am

Hi, i am using streaming openai api. but when answer is long, the response break.
Here is my code

def call_openai_streaming(
    client, gpt_model_name, messages, temperature, response_format, final_response
):
    openai_stream = client.chat.completions.create(
        model=gpt_model_name,
        messages=messages,
        stream=True,
        temperature=temperature,
        response_format=response_format,
    )

    for chunk in openai_stream:
        if chunk.choices[0].delta.content is not None:
            final_response += chunk.choices[0].delta.content
            yield chunk.choices[0].delta.content

def qa_conversation(config, conversation_log, request):
    messages = [
        {"role": "system", "content": ....},
        {"role": "user", "content": ....},
    ]
    @traceable(run_type="chain", name=f"qa_conversation")
    def llm_call(messages):
        final_response = ""
        client = OpenAI(api_key=config["OPENAI_KEY"])
        yield from call_openai_streaming(
            client, "gpt-4", messages, 0.0, None, final_response
        )
        
    return llm_call(messages)

fastapi

@app.post("/generate_assistant_response")
def assistant_response(request: Request):
    headers = {
        "Content-Type": "text/plain",
        "Transfer-Encoding": "chunked",
        "Connection": "Transfer-Encoding",
    }
    
    conversation_log = request.conversation_log
    intent_type = request.intent.get("type", None)

    if intent_type == "qa":
        generator = qa_conversation(config, request.conversation_log, request)
        return StreamingResponse(generator, media_type="text/plain", headers=headers)

i am logging error via langmsith, and in that case langsmith show this error

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/langsmith/run_helpers.py", line 443, in generator_wrapper
    yield item
GeneratorExit

I am open to all sort of suggestion which can help to debug this issue in a better way. I only get error when answer is long, for short answer, there is no issue

_j · January 11, 2024, 9:51am

Since we can’t see what “client” is being passed, is there a chance you are setting a shorter timeout on it than the time required to respond?

talha_anwar · January 11, 2024, 10:23am

this is client

client = OpenAI(api_key=config[“OPENAI_KEY”])

_j · January 11, 2024, 10:28am

The openai library can get the OPENAI_API_KEY value from your OS environment with no additional code.

Here’s increasing the timeouts higher just to see if that is an issue:

client = OpenAI(timeout=httpx.Timeout(300.0, read=20.0, write=20.0, connect=10.0))

dignity_for_all · January 11, 2024, 1:11pm

client = OpenAI(api_key=config["OPENAI_KEY"])

This format suggests the use of the openai module. Indeed, the openai module can explicitly accept a specified API key, but it can be omitted if it is set via an environment variable.
Furthermore,

call_openai_streaming(
    client, gpt_model_name, messages, temperature, response_format, final_response
):

The above signature caught my attention.

The chat.completions.create method within the openai module does not accept a final_response parameter, which suggests that the instance may not be from the openai module but rather from another module such as lagchain.

In the given example, neither the import statements nor the caller of the call_openai_streaming function are explicitly shown.

While LangSmith and W&B are extremely useful for observing the behavior of LLMs, if you seek advice from other members, it is necessary to avoid ambiguous notations and to clearly state dependencies, providing as much reproducible code as possible to replicate the phenomenon.

This may sound like an explanation of general etiquette within an OSS community, but it is important when seeking advice from other members within the community.

While this may not be a direct answer to the problem, I hope it can be of some help.

talha_anwar · January 15, 2024, 5:42pm

I have added more details. can you look into that

Topic		Replies	Views
Getting ChunkedEncodingError in every stream request to GPT-4 API	8	4636	December 24, 2023
Appropriate way of timing out an asynchronous chat completions stream API api	3	3932	November 3, 2023
Recurring "Max retries exceeded" error with long prompts using ChatGPT API API gpt-4 , gpt-35-turbo , api	14	8285	October 23, 2023
Frequent Timeout error observed in non english messages! API gpt-35-turbo , chatgpt , api	0	163	February 7, 2024
My Chatbot Conversation API hangs/Freezes Randomly with no error API embeddings , chatgpt , api	3	750	December 16, 2023

Streaming response break if openai take long time

Related Topics