How can I speed up an analytic chatbot that's based on Langchain (with agents and tools) and Streamlit and disable its intermediate steps?

I created an analytic chatbot using Langchain (with tools and agents) for the backend and Streamlit for the frontend. It works, but for some users’ questions, it takes too much time to output anything. If I look at the output of intermediate steps, I can see that the chatbot tries to print out all relevant rows in the output. For example, below, the chatbot found 40 relevant comments and printed them out in one of its intermediate steps one by one (it takes up to one minute).

enter image description here

My questions are:

  1. Is there any way to speed up this process?
  2. How can I disable the intermediate output of the chatbot? (I already put return_intermediate_steps=False, verbose=False, and expand_new_thoughts=False, but the chatbot still shows intermediate steps.)

Code for chatbot:



def load_data(path):
    return pd.read_csv(path)

if st.sidebar.button('Use Data'):
    # If button is clicked, load the EDW.csv file
    st.session_state["df"] = load_data('./data/EDW.csv')
uploaded_file = st.sidebar.file_uploader("Choose a CSV file", type="csv")


if "df" in st.session_state:

    msgs = StreamlitChatMessageHistory()
    memory = ConversationBufferWindowMemory(chat_memory=msgs, 
                                            return_messages=True, 
                                            k=5, 
                                            memory_key="chat_history", 
                                            output_key="output")
    
    if len(msgs.messages) == 0 or st.sidebar.button("Reset chat history"):
        msgs.clear()
        msgs.add_ai_message("How can I help you?")
        st.session_state.steps = {}

    avatars = {"human": "user", "ai": "assistant"}

    # Display a chat input widget
    if prompt := st.chat_input(placeholder=""):
        st.chat_message("user").write(prompt)

        llm = AzureChatOpenAI(
                        deployment_name = "gpt-4",
                        model_name = "gpt-4",
                        openai_api_key = os.environ["OPENAI_API_KEY"],
                        openai_api_version = os.environ["OPENAI_API_VERSION"],
                        openai_api_base = os.environ["OPENAI_API_BASE"],
                        temperature = 0, 
                        streaming=True
                        )
        
        max_number_of_rows = 40
        agent_analytics_node = create_pandas_dataframe_agent(
                                                        llm, 
                                                        st.session_state["df"], 
                                                        verbose=False, 
                                                        agent_type=AgentType.OPENAI_FUNCTIONS,
                                                        reduce_k_below_max_tokens=True, # to not exceed token limit 
                                                        max_execution_time = 20,
                                                        early_stopping_method="generate", # will generate a final answer after the max_execution_time has been surpassed
                                                        # max_iterations=2, # to cap an agent at taking a certain number of steps
                                                    )
        tool_analytics_node = Tool(
                                return_intermediate_steps=False,
                                name='Analytics Node',
                                func=agent_analytics_node.run,
                                description=f''' 
                                            This tool is useful when you need to answer questions about data stored in a pandas dataframe, referred to as 'df'. 
                                            'df' comprises the following columns: {st.session_state["df"].columns.to_list()}.
                                            Here is a sample of the data: {st.session_state["df"].head(5)}.
                                            When working with df, ensure not to output more than {max_number_of_rows} rows at once, either in intermediate steps or in the final answer. This is because df could contain too many rows, which could potentially overload memory, for example instead of `df[df['survey_comment'].str.contains('wet', na=False, case=False)]['survey_comment'].tolist()` use `df[df['survey_comment'].str.contains('wet', na=False, case=False)]['survey_comment'].head({max_number_of_rows}).tolist()`.
                                            '''
                            )              
        
        tools = [tool_analytics_node] 
        chat_agent = ConversationalChatAgent.from_llm_and_tools(llm=llm, tools=tools, return_intermediate_steps=False)
    
        
        executor = AgentExecutor.from_agent_and_tools(
                                                        agent=chat_agent,
                                                        tools=tools,
                                                        memory=memory,
                                                        return_intermediate_steps=False,
                                                        handle_parsing_errors=True,
                                                        verbose=False,
                                                    )
        
        with st.chat_message("assistant"):
          
            st_cb = StreamlitCallbackHandler(st.container(), expand_new_thoughts=False)
            response = executor(prompt, callbacks=[st_cb])
            st.write(response["output"])

Hello @ill42
Did you find a solution for this?

Hi @elzeindima,

To remove the intermediate steps, just don’t create and use a StreamlitCallbackHandler. The rest should work without it.
E.g. instead of:

st_cb = StreamlitCallbackHandler(st.container(), expand_new_thoughts=False)
response = executor(prompt, callbacks=[st_cb])

Just do

response = executor(prompt)