Converting natural language to SQL query

datascience208 · October 4, 2023, 7:09am

Hello ,
I am trying to convert the natural language into SQL queries. Could you please provide some common problems in the conversion?
Thanks!

udm17 · October 4, 2023, 8:33am

It’s actually not that complex of a problem to solve.

If you provide the database columns and a small description of the data that is stored in the columns plus a sample query or two at temperature 0, you should be able to get really good accuracy for SQL query generation. The trick lies in finding what exactly to put in the context and what to put into the prompt

datascience208 · October 4, 2023, 11:31am

Thanks @udm17 .
But I want to get the same query for multiple type of questions.
For example:

How many customers have logged-in in last 30 days.
2)How many customers active in last 30 days.
In this case for the above two questions I want to get the same query.
Can we use embedding to solve this issue?
Thanks!

udm17 · October 4, 2023, 11:34am

Can you elaborate a bit more on how you would use the embeddings ?

Based on my trials with trying to generate code/cli, with GPT’s understanding, you should not have a problem with the same query being generated for different questions as long as the semantics and intent of the questions are the same

SomebodySysop · October 8, 2023, 10:38am

That is the whole point, I would think, of using natural language to create sql queries. This is basically how you do it: https://platform.openai.com/examples/default-sql-translate

Pretty simple. But, what I do in addition to providing the table layout is also provide more detailed field descriptions to help the model understand how they can be used to answer questions.

For example, if your last login date field is lastLogin, describe it as “last login activity date” or something like that. That way the model will know this field can be used to answer both queries.

Gadcuit · October 8, 2023, 2:41pm

Natural language is inherently ambiguous. For example, “How many customers have logged in in the last 30 days?” might imply counting logins, but it could also mean counting unique customers.

_j · October 8, 2023, 10:00pm

not really that ambiguous an example. simple not negation.

“How many customers have NOT logged in in the last 30 days?”

FROM CUSTOMERS...

or get really edgy

“How many NOT customers have logged in in the last 30 days?”

Gadcuit · October 9, 2023, 8:32am

Your right about clarity. Next point, I think that embedding is not the solution. It’s just a fix at the end of the problem and it will add more burden to the model. The solution probably lies in the architecture of the model.

ashishmahajan02 · November 12, 2023, 5:35pm

a few thoughts…
are you taking an out of the box llm and looking for NL2SQL/text2sql? are you looking for the model to be aware of a specific database/schema/data/etc. that you are looking to generate sql for? What is the complexity of sql you are looking to generate.

A base model may only give you a certain percentage of efficacy and accuracy and will not be aware of your specific schema, data, verbiage etc. You may have to fine-tune the model (Peft - LoRA, QLoRA) to make it more intelligent on sql and specific datasets you are looking to write sql against. This should increase the efficacy of the outputs. In addition you may need to use prompt engineering and RAG. Alternatively full tuning may be an option but is an expensive compute operation which requires readjusting the weights of base model. Fine-tuning is an alternative to full tuning where some of the weights from weight matrix maybe adjusted and is less compute intensive. OpenAI released GPT’s which are trained and intelligent on a specific, so a GPT for SQL will be something similar where it is primarily trained for the purpose of generating sql. One can go for further where gpt is further trained on an organizations schema and data and works proficiently to generate sql for a particular organization.

Topic		Replies	Views
Natural Language to SQL with huge table schema API	12	9182	December 19, 2023
Turning chatgpt API into a assistant for a (complex) website API	20	4301	December 21, 2023
Converting natural language into SQL query API	11	1176	December 18, 2023
Generate SQL queries combining prompt engineering and fine-tuning API	4	8802	December 24, 2023
How do i create a custom model for text to sql queries? API	13	11993	December 18, 2023

Converting natural language to SQL query

Related topics