I am planning to build a chatbot that works on specific types of data sets. The data set contains prices, promotions, and other details from a few websites, and the data is updated periodically in a Postgres database. How do I train a model where the customer’s natural language questions can be converted into SQL queries and then visualize it? Ideally, it generates an SQL query that I can run on something like metabase or redash and get charts and graphs. The questions can be like “How many products are out of stock today for each competitor on Amazon”, “How many products had a more than 10% price drop” etc.
You could try to make an instruction-based prompt for 100% LLM solution like:
You are an assistant that has access to a SQL database. The database has 3 columns: name, price, quantity. Please create a SQL script that fulfills the following request:
Request: {{input}}
That’s one way. The other way is to make a bunch of common functions and use something like langchain to chain the AI instructions together. That would be the more stable version that stays stable throughout AI upgrades. Only downside is you will have to make a few functions here and there. But you could add the most common first, then add more over time.
Thanks for the suggestion. Let me try the first solution and see how it works. I already have a set of natural language questions and their corresponding SQL queries. Let me train and see how it works. I think I also need to build a feedback system. It means once I pass open ai-generated SQL to metabase API, and it causes an error ( because the query was wrong). Then it should regenerate a query and try again. Also, if there is a way to explain the query in natural language, that would be better, too, as it will help the user improve his prompting and get better results.
def do_sql_stuff(sql_string_from_gpt):
try:
response = cursor.execute(sql_string_from_gpt)
return response, True
except Exception e:
return e, False
def summarize_sql(prompt, command=None, error=None)
full_prompt = prompt
if error != None:
full_prompt = f"This ERROR occurred while fulfilling this REQUEST with this COMMAND. [REQUEST]{full_prompt}[END REQUEST]\n[COMMAND]{command}[END COMMAND]\n[ERROR]{error}[END ERROR]\nPlease correct the error in the SQL command."
sql_command= ask_ai(full_prompt)
response, success = do_sql_stuff(sql_command)
return response, sql_command, success
Where the results from the summarize_sql will go through some sort of logic to determine if you want to retry again. Definitely make sure there is an attempt counter.
Seems like such an obvious problem that you’d think someone would have created this already. Oh yeah, I did .
This is a command line tool to get you data from postgres.
The prompt is quite simple: ria/pagila2 at main · drorm/ria · GitHub
and the relevant code is in: ria/db.ts at main · drorm/ria · GitHub
The only bit where I can claim any cleverness is the way I generate the schema description for GPT. A real full size schema can get huge really quickly. I did some hacks to reduce the size.
Note that this was a quick and dirty project, so take everything with a grain of salt.
I was doing some research for a project I am building and I accidently found out langchain has a SQL Chain already built out. It came up randomly and is not related to what I am doing but I was reminded of this post. So, I just wanted to hit you up and let you know.
Sounds interesting from an Amazon Vendor’s perspective. If you are down to chat and dig deeper into this idea/product, feel free to reach out on Tgram at @exascaleESG.
can you please have a look and guide me in some way
and may I know what is the accuracy level of the your English to SQL terminal converter you have made
Because I dont see much any lengthy prompts or so making the open ai understand to give a sql query which many of the online resources have suggested and which I have also tried.