In GPT4 streamed responses all chunks come in a single batch

jordi · March 13, 2024, 9:30am

I’m developing a flutter app for a RAG virtual assistant. When I request streamed responses from OpenAI I get all chunks at once in a single batch. This is not what I would expect rather than getting one or a few chunks at the same time, which is what I observed a few weeks ago when I first set up the system.
do you know if there has been any change on OpenAI api end?

Diet · March 13, 2024, 3:17pm

I was going to say, are you sure there’s no programming error in your app?

But I’ve started observing the same thing with azure; frames come in massive batches (maybe 10 or 12) with some models in some regions at some times.

I’m wondering if this is due to congestion or something

jordi · March 13, 2024, 3:33pm

I first observed it when connecting through azure and then I changed back to direct OpenAI api connection and it is the same.
I’m using gpt-4-0125-preview.

Diet · March 13, 2024, 3:54pm

Mmh. I’m not having the issue with the OpenAI api. It’s always a good idea to check if your issues are happening on the playground to figure out if it’s a you issue or an API issue.

For azure, it’s even happening on the playground (north central, gpt 4 135)

here’s a minimal azure test in jupyter to rule out any async issues (same result for me)

# Azure

import requests
import json
import os


endpoint= 'https://yourendpoint.openai.azure.com/openai/deployments/0314/chat/completions?api-version=2024-02-15-preview'
api_key_handle= 'AZURE_OPENAI_KEY_US_EAST'

# Ensure you have your OpenAI API key set in the environment variables
openai_api_key = os.getenv(api_key_handle)
if openai_api_key is None:
    raise ValueError("OpenAI API key is not set in environment variables.")

#url = "https://api.openai.com/v1/chat/completions"
url = endpoint;

headers = {
    "Content-Type": "application/json",
    "api-key": f"{openai_api_key}"
}

data = {
    "temperature": 1, 
    "max_tokens": 256,
    "logit_bias": {1734:-100},
    "messages": [
        {
            "role": "system", 
            "content": "You are the new bosmang of Tycho Station, a tru born and bred belta. You talk like a belta, you act like a belta. The user is a tumang."
        },
        {
            "role": "user",
            "content": "how do I become a beltalowda like you?"
        }
    ],
    "stream": True,  # Changed to True to enable streaming
}

response = requests.post(url, headers=headers, json=data, stream=True)

if response.status_code == 200:
    for line in response.iter_lines():
        if line:
            decoded_line = line.decode('utf-8')
            # Check if the stream is done
            if '[DONE]' in decoded_line:
                # print("\nStream ended by the server.")
                break
            json_str = decoded_line[len('data: '):]
            try:
                json_response = json.loads(json_str)
                if json_response['choices']:
                    delta = json_response['choices'][0]['delta']
                    if 'content' in delta and delta['content']:
                        print(delta['content'], end='', flush=True) 
                else:
                    print(json_response)
            except json.JSONDecodeError as e:
                raise Exception(f"Non-JSON content received: {decoded_line}")
else:
    print("Error:", response.status_code, response.text)

florian_mock91 · June 7, 2024, 1:10pm

The reason for this behavior is the standard content filter used for the models in azure. Its not in the documentation currently but you can define in the UI that these filters should be used in a asynchronous manner. If you do this, the batching will stop.
As i can not include links in my response, just google yourself: Documentation for content filters azure

How the options looks in the UI. I am sorry, the UI is in german, but i guess it looks very similar for you as well.

Topic		Replies	Views
Stream api seems not real streaming API gpt-4 , streaming	0	616	December 25, 2023
GPT4o Hangs after first chunk API gpt-4o	8	700	June 4, 2024
ChatGPT stream API response has chunks messed up API chatgpt , api	2	1853	April 29, 2024
Issue with Chunk Streaming in ASP.NET Core using GPT-4 API API chatgpt , api , streaming	0	920	January 30, 2024
Azure OpenAI streaming vs. OpenAI directly API chatgpt	3	2854	July 7, 2023

In GPT4 streamed responses all chunks come in a single batch

Related topics