Issues with SQL Chatbot: Slow Responses, LLM Data Pull, and Caching Needs

Hello everyone,

I am developing a SQL chatbot using FastAPI, LangChain, OpenAI GPT-4o (or similar), and Snowflake as the data warehouse. The chatbot allows users to ask questions in natural language and receive answers in natural language. However, I am facing several issues:

  1. Slow Response Times:

    • The chatbot often takes a long time to respond, especially for complex or repeated queries.
    • I suspect this is due to the LLM generating SQL and then executing it on Snowflake, but I am not sure if there are unnecessary data fetches or bottlenecks in the process.
  2. LLM Pulling Data into Itself:

    • From the logs and outputs, it appears that the LLM is not just generating SQL but is also pulling data from Snowflake and possibly creating tables or sampling data within itself.
    • This increases response time and costs, as the LLM is processing more data than necessary.
  3. Caching Implementation:

    • I would like to implement caching to improve user experience for repeated queries.
    • I am unsure about the best approach (in-memory vs. Redis) and how to integrate caching with my FastAPI endpoints.
  4. Data Structure Choice:

    • Currently, my data is stored in Snowflake (relational tables).
    • I am wondering if a document-based database (like MongoDB) would be more suitable for my use case, or if sticking with Snowflake is better for a SQL chatbot focused on analytics and business intelligence.

Question:
Has anyone faced similar issues with LangChain, OpenAI, and Snowflake? How can I ensure that the LLM only generates SQL and executes it on Snowflake, without pulling extra data into itself? What is the best way to implement caching in a FastAPI chatbot for repeated queries? Any advice on data structure choices for this type of application would also be appreciated.

Thank you in advance for your help

I have used connection pooling, as well as restricting the sample_rows_in_table_info to 5.
Along with the time logging at different steps.