GPT Keeps Ignoring My Sample SQL Query 😩

I’m working on a feature where users enter a prompt, and GPT-4 generates a SQL query based on that. To guide it, I’m also passing in a sample SQL query that’s similar to what I want as a reference.

I’m using GPT-4 with temperature set to 0.3 — expecting it to stay grounded and logical — but it still keeps drifting. It often ignores the sample query completely and generates something structurally different or adds things I didn’t ask for.

Anyone else faced this?
How do you get GPT-4 to actually follow the sample query’s structure and logic more strictly?
Do I need to prompt it differently or tweak it

Can you please share more information?

  • What is your prompt?
  • What are some examples of accurate output?
  • What are you getting instead, and why is it inaccurate?

I also recommend using the more modern gpt-4.1 instead of the original GPT-4.

1 Like

Hi @OnceAndTwice ,

Sample Prompt: Give me the Timesheet Hours for the Project id 2930
Sample Query:
SELECT SUM(TH.hours) AS timesheet_hours, TH.project_id
FROM timesheet_hours TH
INNER JOIN project_main PM on PM.id = TH.project_id
WHERE TH.live = 1 AND PM.live = 1 AND PM.id = 2930

Here GPT sometime might remove TH.live = 1 or PM.live=1

Currently using GPT 4 with Temperature at 0.3

This might provide wrong output.

1 Like

Oh- you’ll need to give it your database schema and other relevant information in your system instructions before it’ll be able to follow that.

You can also experiment with injecting examples after your prompt to help guide the model. This is a common technique known as “training with shots.”

I still also recommend experimenting with gpt-4.1 to see if this improves your accuracy.

Let me know if these are applicable or helpful. :slight_smile:

I’m not entirely sure I understand your use case, but based on what I understood, it seems that MCP could be a suitable option.

You can connect an MCP client agent to the MCP server. The agent determines which MCP server to contact, generates the appropriate query, and executes it based on the user’s input.

import os
import psycopg2
from dotenv import load_dotenv
from typing import Optional
from mcp.server.fastmcp import FastMCP
from psycopg2.extras import RealDictCursor

load_dotenv()

mcp = FastMCP(
    "Postgres Search Service",
    instructions="Search and retrieve information from Postgres databases.",
    debug=False,
    log_level="INFO",
    port=8002,
)


def get_connection():
    postgres_user = os.getenv("POSTGRES_USER")
    postgres_password = os.getenv("POSTGRES_PASSWORD")
    postgres_db = os.getenv("POSTGRES_DB")
    postgres_host = "postgres"

    print(f"Connecting to PostgreSQL with: {postgres_user}:{postgres_password}@{postgres_host}:5432/{postgres_db}")

    return psycopg2.connect(
        host=postgres_host,
        port="5432",
        dbname=postgres_db,
        user=postgres_user,
        password=postgres_password,
    )


@mcp.tool()
def query(sql: str, parameters: Optional[list] = None) -> str:
    """
    Execute a SQL query and return the results.

    Args:
        sql: SQL query to be executed
        paramaters: Additional parameter for execution.
    """
    conn = None
    try:
        conn = get_connection()

        with conn.cursor(cursor_factory=RealDictCursor) as cur:
            try:
                if parameters:
                    query_string = cur.mogrify(sql, parameters).decode("utf-8")
                else:
                    query_string = sql

                cur.execute(query_string)

                if cur.description is None:
                    conn.commit()
                    return f"Query executed successfully, {cur.rowcount} rows affected."

                rows = cur.fetchall()
                if not rows:
                    return "No results found."

                result_lines = ["Results:", "--------"]
                for row in rows:
                    try:
                        line_items = []
                        for key, val in row.items():
                            if val is None:
                                formatted_val = "NULL"
                            elif isinstance(val, (bytes, bytearray)):
                                formatted_val = val.decode("utf-8", errors="replace")
                            else:
                                formatted_val = str(val).replace("%", "%%")

                            line_items.append(f"{key}: {formatted_val}")
                        result_lines.append(" | ".join(line_items))

                    except Exception as e:
                        result_lines.append(f"Error formatting row: {str(e)}")
                        continue

                return "\n".join(result_lines)

            except Exception as e:
                return f'Error running query \n"{sql}"\n\n {str(e)}'

    except Exception as e:
        return f"Connection error: {str(e)}"

    finally:
        if conn:
            conn.close()


@mcp.tool()
def list_schemas() -> str:
    """List all the schemas in the database"""
    return query(
        "SELECT schema_name FROM information_schema.schemata ORDER BY schema_name"
    )


@mcp.tool()
def list_tables(db_schema: str = "public") -> str:
    """
    List all tables in a specific schema

    Args:
        db_schema: The schema name to list tables.
    """
    sql = """
    SELECT table_name, table_type
    FROM information_schema.tables
    WHERE table_schema = %s
    ORDER BY table_name
    """
    return query(sql, [db_schema])


@mcp.tool()
def describe_table(table_name: str, db_schema: str = "public") -> str:
    """Get detailed information about a table.

    Args:
        table_name: The name of the table to describe
        db_schema: The schema name (defaults to 'public')
    """
    sql = """
    SELECT 
        column_name,
        data_type,
        is_nullable,
        column_default,
        character_maximum_length
    FROM information_schema.columns
    WHERE table_schema = %s AND table_name = %s
    ORDER BY ordinal_position
    """
    return query(sql, [db_schema, table_name])


@mcp.tool()
def get_foreign_keys(table_name: str, db_schema: str = "public") -> str:
    """Get foreign key information for a table.

    Args:
        table_name: The name of the table to get foreign keys from
        db_schema: The schema name (defaults to 'public')
    """
    sql = """
    SELECT 
        tc.constraint_name,
        kcu.column_name as fk_column,
        ccu.table_schema as referenced_schema,
        ccu.table_name as referenced_table,
        ccu.column_name as referenced_column
    FROM information_schema.table_constraints tc
    JOIN information_schema.key_column_usage kcu
        ON tc.constraint_name = kcu.constraint_name
        AND tc.table_schema = kcu.table_schema
    JOIN information_schema.referential_constraints rc
        ON tc.constraint_name = rc.constraint_name
    JOIN information_schema.constraint_column_usage ccu
        ON rc.unique_constraint_name = ccu.constraint_name
    WHERE tc.constraint_type = 'FOREIGN KEY'
        AND tc.table_schema = %s
        AND tc.table_name = %s
    ORDER BY tc.constraint_name, kcu.ordinal_position
    """
    return query(sql, [db_schema, table_name])


@mcp.tool()
def find_relationships(table_name: str, db_schema: str = "public") -> str:
    """Find both explicit and implied relationships for a table.

    Args:
        table_name: The name of the table to analyze relationships for
        db_schema: The schema name (defaults to 'public')
    """
    try:
        fk_sql = """
        SELECT 
            kcu.column_name,
            ccu.table_name as foreign_table,
            ccu.column_name as foreign_column,
            'Explicit FK' as relationship_type,
            1 as confidence_level
        FROM information_schema.table_constraints tc
        JOIN information_schema.key_column_usage kcu 
            ON tc.constraint_name = kcu.constraint_name
            AND tc.table_schema = kcu.table_schema
        JOIN information_schema.constraint_column_usage ccu
            ON ccu.constraint_name = tc.constraint_name
            AND ccu.table_schema = tc.table_schema
        WHERE tc.constraint_type = 'FOREIGN KEY'
            AND tc.table_schema = %s
            AND tc.table_name = %s
        """

        implied_sql = """
        WITH source_columns AS (
            -- Get all ID-like columns from our table
            SELECT column_name, data_type
            FROM information_schema.columns
            WHERE table_schema = %s 
            AND table_name = %s
            AND (
                column_name LIKE '%%id' 
                OR column_name LIKE '%%_id'
                OR column_name LIKE '%%_fk'
            )
        ),
        potential_references AS (
            -- Find tables that might be referenced by our ID columns
            SELECT DISTINCT
                sc.column_name as source_column,
                sc.data_type as source_type,
                t.table_name as target_table,
                c.column_name as target_column,
                c.data_type as target_type,
                CASE
                    -- Highest confidence: column matches table_id pattern and types match
                    WHEN sc.column_name = t.table_name || '_id' 
                        AND sc.data_type = c.data_type THEN 2
                    -- High confidence: column ends with _id and types match
                    WHEN sc.column_name LIKE '%%_id' 
                        AND sc.data_type = c.data_type THEN 3
                    -- Medium confidence: column contains table name and types match
                    WHEN sc.column_name LIKE '%%' || t.table_name || '%%'
                        AND sc.data_type = c.data_type THEN 4
                    -- Lower confidence: column ends with id and types match
                    WHEN sc.column_name LIKE '%%id'
                        AND sc.data_type = c.data_type THEN 5
                END as confidence_level
            FROM source_columns sc
            CROSS JOIN information_schema.tables t
            JOIN information_schema.columns c 
                ON c.table_schema = t.table_schema 
                AND c.table_name = t.table_name
                AND (c.column_name = 'id' OR c.column_name = sc.column_name)
            WHERE t.table_schema = %s
                AND t.table_name != %s  -- Exclude self-references
        )
        SELECT 
            source_column as column_name,
            target_table as foreign_table,
            target_column as foreign_column,
            CASE 
                WHEN confidence_level = 2 THEN 'Strong implied relationship (exact match)'
                WHEN confidence_level = 3 THEN 'Strong implied relationship (_id pattern)'
                WHEN confidence_level = 4 THEN 'Likely implied relationship (name match)'
                ELSE 'Possible implied relationship'
            END as relationship_type,
            confidence_level
        FROM potential_references
        WHERE confidence_level IS NOT NULL
        ORDER BY confidence_level, source_column;
        """

        fk_results = query(fk_sql, [db_schema, table_name])
        implied_results = query(
            implied_sql, [db_schema, table_name, db_schema, table_name]
        )

        if fk_results == "No results found" and implied_results == "No results found":
            return "No relationships found for this table"

        return f"Explicit Foreign Keys:\n{fk_results}\n\nImplied Relationships:\n{implied_results}"

    except Exception as e:
        return f"Error finding relationships: {str(e)}"


if __name__ == "__main__":
    mcp.run(transport="sse")

Git repo: mcp_server_client/workspace/servers/mcp_server_2.py at master · muralianand12345/mcp_server_client · GitHub

But in my database schema, I have over 100 tables, each with more than 50 columns.
Passing the entire schema would consume too many tokens and may not be efficient.

Let me try using GPT-4.1 instead — it might handle the context better in this case.

You have to provide your schema in order for a LLM to know what your schema is.

Speaking of efficiency, 100 different tables with 50 columns each sounds extremely inefficient. So, my perception of prompting the LLM efficiently would be to improve the schema, thus a smaller input.

And if you really can’t give it your schema, then whoever is using the assistant will need to know the table and column names in advance.