Streamlit using a OpenAI for querying metadata for multiple Snowflake tables

I’m using a quickstart guide that includes a prompt file that lets us add top level info about 2 Snowflake tables that I send to OpeanAi via a function I created: get_system_prompt
which is used like this:
st.session_state.messages = [{"role": "system", "content": get_system_prompt()}]

In this function I’ve hardcoded a table description variable that I wrote describing the 2 tables.

Unfortunately I can’t attach a link but the name of the quickstart I’m using is " Frosty: Build an LLM Chatbot in Streamlit on your Snowflake Data"

I’m looking for some recommendations on how to make this more dynamic so I don’t have to hardcode each table name, and also how to make the table description for each table? Is it necessary to include the table description, its alot of typing for like 100 tables… Open to any/all suggestions!

Welcome to the forum! Testing some support bots. Hope you don’t mind!

Bot #1:

It sounds like you’re building a chatbot using Streamlit and integrating it with Snowflake data, and you’re seeking ways to dynamically handle information about multiple Snowflake tables without hardcoding each table name and description.

Here are some suggestions to achieve this:

Dynamic Table Information Retrieval:

Instead of hardcoding table names and descriptions, you can write a function to retrieve this information directly from Snowflake. Snowflake stores metadata about all tables in its Information Schema. You can write a SQL query to fetch table names, column names, data types, and even comments (which could serve as descriptions) from the INFORMATION_SCHEMA.TABLES and INFORMATION_SCHEMA.COLUMNS views.

Automated Description Generation:

For generating table descriptions, you could either:

Use existing comments in the database if they are comprehensive enough.

Implement a basic algorithm to generate descriptions based on column names and data types. For instance, if a table has columns like ‘customer_id’, ‘order_date’, ‘total_amount’, a generated description could be “Table containing customer orders with details on order dates and amounts.”

Integrating Dynamic Retrieval in Streamlit:

You can use Streamlit’s caching mechanism to store the metadata. This way, you don’t have to query Snowflake every time. Use @st.cache to cache the results of your metadata fetching function.

User Interface for Selecting Tables:

Implement a user interface element in Streamlit, like a dropdown, that lets users select a table. This dropdown can be dynamically populated with the names of the tables retrieved from Snowflake.

Optional Descriptions:

Consider whether table descriptions are essential for every table. If they are not critical, you might allow users to view them optionally, reducing the initial information overload.

Documentation and Comments:

Encourage good practices for documentation within your database. Well-commented tables and columns in Snowflake will ease the process of generating descriptions and understanding the data structure.

Advanced: Natural Language Description Generation:

As an advanced feature, you could leverage an LLM (like GPT-3 or GPT-4) to generate more natural-sounding table descriptions based on the table and column metadata. This would require a more complex setup but could provide a more user-friendly experience.

Remember to handle database credentials and access securely, especially when integrating with a web application like Streamlit. It’s important to ensure that your application only provides access to authorized users and maintains data privacy and integrity.

Bot #2

Hmm, I don’t know enough to give you a confident answer yet. The extracts provided do not contain specific information about querying metadata for multiple Snowflake tables using OpenAI in a Streamlit application. However, I can provide some general guidance based on the information available.

From the extracts, it’s clear that OpenAI models can be fine-tuned to better handle specific tasks or data structures. This could potentially be applied to your use case, where you want to dynamically handle descriptions for multiple Snowflake tables.

Fine-tuning involves preparing and uploading training data, training a new fine-tuned model, evaluating results, and using your fine-tuned model (source (https://platform.openai.com/docs/guides/fine-tuning)). In terms of making your prompts more dynamic, the extracts suggest that you can use a few-shot prompt to extend or modify a fine-tuning dataset. This could potentially be used to extract a structured dataset from your unstructured text, and then fine-tune a custom model on that structured dataset (source (openai-cookbook/articles/techniques_to_improve_reliability.md at main · openai/openai-cookbook · GitHub)).

For better results, it’s recommended to use the latest models and to be specific, descriptive, and as detailed as possible about the desired context, outcome, length, format, style, etc. (source (https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-openai-api#h_ad50145be0)).

However, these are general guidelines and may not directly apply to your specific use case. For more detailed guidance, I recommend referring to OpenAI’s documentation and guides, such as the Fine-tuning guide (https://platform.openai.com/docs/guides/fine-tuning) and the Prompt engineering guide (https://platform.openai.com/docs/guides/prompt-engineering).

First answer likely more helpful for you?