Getting ChunkedEncodingError in every stream request to GPT-4

Every time I send a streaming request to gpt-4 it starts fine, streaming chunks as expected. But it always gets interrupted by this ChunkedEncodingError:

requests.exceptions.ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))

I see on Github this has been an issue going back months. I’m curious if anyone else has seen this and if you can recommend any workarounds. I’ve tried putting the iteration over the response object in a try/except block but once it reaches this state I don’t know how to get back to streaming completions.

Edited to include code, adapted from the OpenAI Cookbook:

# vars declared above...
response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=temperature,
        top_p=top_p,
        frequency_penalty=freq_penalty,
        presence_penalty=pres_penalty,
        stream=True,
    )

# Response stream
for chunk in response:
    # Mark the time (for reporting only)
    chunk_time = time.time() - start_time
    last_segment_time += time.time() - last_tick
    last_tick = time.time()

    # Collect streamed chunk
    collected_chunks.append(chunk)
    chunk_message = chunk["choices"][0]["delta"]
    collected_messages.append(chunk_message)
    segments.append(chunk_message)
    with open(out_path, "a") as file:
        file.write(chunk_message.get("content", ""))

    # Every 1 second, join the segment and print
    if last_segment_time > 1:
        segment_text = "".join(s.get("content", "") for s in segments)
        tqdm.write(f"Received: {segment_text}")
        segments = []
        last_segment_time = 0

print(f"Full response received {chunk_time:.2f} seconds after request")

Hi @ben.basseri

Can you share the code you’re using for streaming? Include the api call as well.

Sure, sps, I edited the original post to include code. Thanks for looking!

Thanks for sharing the code.

Please make sure that you’re on the latest version of the openai library.

In the context of your code, this error is raised while you’re streaming a response from the OpenAI API. It means that the connection was broken in the middle of receiving a chunk of the response.

Also, I’m curious why are you writing the response to a file every second ?

In what sense is the connection being broken? It looks like there may be an improperly formatted chunk received that’s causing the exception. The reason I’m writing to file to capture the response chunks before the completion fails. Could do this in a try/except block except that’s not solving the fundamental issue I’m trying to address

Which could be due to network issues or server-side issues.

How does the usage show-up on the dashboard? Is it greater than the tokens you consumed (prompt + generation received )?

I have the same issue when I generate long text using GPT-4.
It seems that this error will occur when the generation exceeds 300 seconds.
There’s a similar issue(#399) in github repository(openai-python).

1 Like

Thanks for pointing out that issue in the github…if the issue truly is the server timing out at 300 s then I suppose the only immediate workaround would be to ask for shorter completions