LLMchain sending the whole database schema when i only need one table

hi guys,
im having this problem with the LLM chain where it’s adding the whole database schema as input. This is a problem because it ups the token count to around 44000 tokens and the limit to the current model im using is 16000, so i’m trying to make it so openai only gets one table from the database. is there any way to do that? This is what i tried:

from langchain_community.utilities import SQLDatabase
from langchain_experimental.sql.base import SQLDatabaseChain
from langchain_openai import ChatOpenAI
from langsmith import Client
from langsmith import traceable
import a_env_vars
import os
import logging
from langchain_community.tools.sql_database.tool import InfoSQLDatabaseTool

# Set up logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)

# Set the OpenAI API key
os.environ["OPENAI_API_KEY"] = a_env_vars.OPENAI_API_KEY

# Enable LangChain tracing
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "chatbot"
os.environ["LANGCHAIN_API_KEY"] = ""
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"

# Initialize LangSmith client
client = Client(api_key=os.getenv("OPENAI_API_KEY"))

# Create the database URI
database_uri = (
    "mssql+pyodbc://@DESKTOP/DB"
    "?driver=ODBC+Driver+17+for+SQL+Server&autocommit=true"
)

# Create a SQLDatabase object
db = SQLDatabase.from_uri(database_uri)

# Create the InfoSQLDatabaseTool
schema_tool = InfoSQLDatabaseTool(db=db)

# Create the LLM
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo", max_tokens=4000)

# Create the LangChain
cadena = SQLDatabaseChain.from_llm(llm, db, verbose=True)

# Custom response format
formato = """ Dada una pregunta del usuario: 
1. Estructura de la tabla: {estructura}
2. Crea una consulta SQL válida. 
3. Revisa los resultados. 
4. Devuelve el dato en español. 
5. Si necesitas aclaraciones adicionales, inclúyelas en el resultado. 
Pregunta: {question} """

# Obtener estructura de la base de datos usando InfoSQLDatabaseTool
def obtener_estructura_db(schema_tool):
    """
    Esta función obtiene solo el esquema de la tabla 'Product' de la base de datos.
    """
    estructura = {}
    table_name = "Product"  # Filtro para solo obtener la tabla "productos"
    
    # Verificar si la tabla "producs" existe en la base de datos
    if table_name in schema_tool.db.get_table_names():
        schema = schema_tool.run(tool_input={"table_names": table_name})
        estructura[table_name] = schema
    else:
        logger.warning(f"La tabla {table_name} no existe en la base de datos.")
    
    return estructura

# Obtener la estructura de la base de datos (solo tabla "productos")
estructura = obtener_estructura_db(schema_tool)

# Función para hacer la consulta
@traceable
def consulta(input_usuario):
    """
    Esta función realiza la consulta al modelo de OpenAI, enviando solo el esquema de la tabla 'productos' y la pregunta.
    """
    consulta_formateada = formato.format(estructura=estructura, question=input_usuario)  # Formato de la consulta
    logger.debug(f"Formatted Query: {consulta_formateada}")

    # Enviar solo la estructura de la tabla 'productos' y la pregunta
    contexto = {
        "estructura": estructura,  # Esquema de la tabla "productos"
        "query": input_usuario  # Pregunta del usuario
    }

    resultado = cadena.invoke(contexto)  # Ejecutamos la consulta con el contexto
    return resultado

What did the langchain folks say? :thinking:

they gave me advice to debug it. i’m using langsmith now wich is where i see the whole schema beeing added as input. SQLDatabaseChain only inputs the query and the table structure.

It’s still a langchain issue :confused:

Or, as mentioned earlier, you simply use a model with a bigger context length to get started :frowning:

Unless you’re asking how you can load a single table schema out of many, depending on what question is being asked or task is supposed to be solved?

You can use a RAG approach for that, I suppose. You could chunk your big schema, potentially generate chunk descriptions, and then use embedding search to figure out what to pull into the context

yes, i’m trying to load a single specific table schema, it doesn’t need to be a different one depending on the question. i tried using embedding search to do as you said but it didn’t solve the problem and i also have no experience with them so it was hard to understand what went wrong but i think the problem is this line.

cadena = SQLDatabaseChain.from_llm(llm, db, verbose=True)

the chain gets a SQL database as a parameter, and the whole schema for that database is counted as input even if i only pull one table structure into the context.
what i’m trying to do now is create a temporary database with only the table i need, and delete it when it’s not needed anymore but creating and deleting a database for every use is not very optimal and it’s giving me some other problems. i will ask again in the langchain community now that i have a more clear idea of the problem and post the solution here if i find one.

There is a parameter for the SQLDatabase method that allowed me to choose the tables i wanted to pull
db = SQLDatabase.from_uri(a_env_vars.database_uri, include_tables=["Product"]) and it decreased the token count enough for the app to work with the 16000 token limit, so i should have been altering the SQLDatabase that way in the first place. Frustratingly simple for how much trouble it’s been giving me but problem solved! thank you

1 Like