Asynchronously Stream OpenAI GPT Outputs: Streamlit App

I could not find any good Streamlit examples online, so here’s my example of how we can asynchronously stream OpenAI’s outputs in a streamlit app.

6 Likes

Brilliant. Thanks - trying to get this working has been a challenge!

1 Like

you’re welcome!

and welcome to the forum! :slight_smile:

Hi,
This is it require any specific version of openAI? I am using OpenAI==0.28 and it is not working for me.

If in doubt do a pip install --upgrade openai there have been updates recently.

1 Like

Perhaps, a slightly simpler example:

from openai import OpenAI
import streamlit as st

st.title("OpenAI-Streamlit streaming response demo")

def get_response(user_prompt):
    response = OpenAI().chat.completions.create(
        model="gpt-4-turbo",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": user_prompt},
        ],
        stream=True
    )

    for chunk in response:
        content = chunk.choices[0].delta.content
            if content:
                yield content

user_prompt = st.text_input("Enter your prompt")
if user_prompt and st.button("Get response"):
    st.write_stream(get_response(user_prompt))

It might be a simpler example but it’s not asynchronous.

2 Likes

Yes, that’s true. Just useful to play around but definitely not for production workloads.

Here’s an example with the Assistants API

Thanks for sharing

agree with the use of write_stream

I believe it was released after I discovered the hacky way of using st.empty() I present here

I could not readily find examples of streaming for Assistants API in Streamlit - so I decided to build one myself.

Mindful the python SDK has these helper functions, but I think this approach of iterating the stream object is more similar to the chat completions API.

Here is a snippet ~

        stream = client.beta.threads.runs.create(
            thread_id=st.session_state.thread_id,
            assistant_id=ASSISTANT_ID,
            stream=True
            )
        
        # Empty container to display the assistant's reply
        assistant_reply_box = st.empty()
        
        # A blank string to store the assistant's reply
        assistant_reply = ""

        # Iterate through the stream 
        for event in stream:
            # There are various types of streaming events. You can check the data type and implement different display options (e.g. code block for tool use)
            # See here: https://platform.openai.com/docs/api-reference/assistants-streaming/events

            # Here, we only consider if there's a delta text
            if isinstance(event, ThreadMessageDelta):
                if isinstance(event.data.delta.content[0], TextDeltaBlock):
                    # empty the container
                    assistant_reply_box.empty()
                    # add the new text
                    assistant_reply += event.data.delta.content[0].text.value
                    # display the new text
                    assistant_reply_box.markdown(assistant_reply)
        
        # Once the stream is over, update chat history
        st.session_state.chat_history.append({"role": "assistant",
                                              "content": assistant_reply})
1 Like