Openai text to sql not working

Team,

Trying to build Natural Language to sql generator , however both text-davinci-003 and gpt-3.5-turbo not able to generate accurate queries, i am passing proper datamodel in the input prompt,also customizing the input but no luck.

Can anyone suggest what could be next step gpt4? or any option to custom train gpt model?

1 Like

Welcome to the Forum!

Can you post some examples of your prompts and the replies generated by the model?

1 Like

Below is the input & output, first it was not creating proper query so i added in prompt “use CTE if query is complex” and bingo 90% work done, it only fails because not able to add proper table alias ,but when i added in input prompt do proper aliasing it terribly failed

example input

consider below tables and its properties
time_dimension(order_id integer primary key,order_date string,year integer,quarter integer,month integer)
employee_dimension(emp_id interger primary key,emp_name varchar2(20),title varchar2(20),department varchar2(20),region varchar(10))
sales_fact(order_id integer foreign key,emp_id integer foreign key,total integer,quantity integer,discount integer)
Generate SQL query to display monthwise employee who recorded the highest sales for last 3 months including current month as per Apache Presto syntax and type cast order_date column to date, if query is complex use CTE

example output

WITH highest_sales AS ( SELECT sd.month, sd.year, ed.emp_name, SUM(sf.total) AS total_sales FROM sales_fact sf JOIN time_dimension td ON sf.order_id = td.order_id JOIN employee_dimension ed ON sf.emp_id = ed.emp_id WHERE td.order_date >= DATE_FORMAT(DATE_TRUNC(‘month’, CURRENT_DATE - INTERVAL ‘3’ month), ‘%Y-%m-%d’) AND td.order_date < DATE_FORMAT(DATE_TRUNC(‘month’, CURRENT_DATE), ‘%Y-%m-%d’) GROUP BY sd.month, sd.year, ed.emp_name ), ranked_sales AS ( SELECT month, year, emp_name, total_sales, ROW_NUMBER() OVER (PARTITION BY month, year ORDER BY total_sales DESC) AS rank FROM highest_sales ) SELECT month, year, emp_name, total_sales FROM ranked_sales WHERE rank = 1 ORDER BY year, month

2 Likes

Is this with GPT-4 or 3.5? GPT-4 is more capable of following longer sets of instructions.

Additionally, you may be asking too many steps in one prompt. There are inference depth limits with models.

Ohh…its gpt-3.5-turbo-16k model, will surely try with gpt-4.

since gpt-4 is a paid one was trying to exercise caution and also wanted to check if fine tuning will work

But Definitely this information helps

1 Like