Generate SQL queries combining prompt engineering and fine-tuning

cavadeos · April 17, 2023, 3:49pm

Hello,

My objective is to automate the generation of SQL queries when prompted with questions from business users.

To do so, I have started to use chatgpt (and similarly the openai.ChatCompletion.create function) to provide information about the the tables and steps to follow when given a business request (example: which column are cumulative or not, how to recognize whether to use one column or another when not specified by the business user etc).

Once instructions are clear, I also add some examples providing the question, the steps to follow to remove ambiguity and then the SQL query.

It is working pretty well. But I am limited in the number of examples I can give in addition to the context and explanations. in the API.
On one side I find very powerful the ability to explain which steps need to be followed when a business request arrives (through prompt engineering).
On the other side, the API limitation (5000 tokens) prevents me from providing a lot of examples that could be very beneficial for the model to learn how to handle queries (through model fine tuning).

Question is: what is the best way to combine the two?
My initial thoughts were to:

Provide questions and answers and to fine-tune a model
Add custom prompts based on this fine-tune model

But the drawback I see here is that the model will be fine-tuned without my guidance (what I put in the prompt on which steps should be followed to clarify the business input), which might be a pity.

What would be your advice / ideas?

abhi3hack · June 2, 2023, 12:37pm

Hey @cavadeos
Have you tried embeddings,
also have you found any solution to your problem

arnoldas · June 3, 2023, 3:59pm

I don’t have a perfect solution, but I have played with this idea as well. Assuming that your tables are named at least somewhat coherently. I found it most effective to give lots of examples of how I join the tables. That already takes care of the most tedious work of writing SQL queries (for me).

I could imagine having multiple API calls

To determine how to join tables
To extract which tables are needed from the request
Match tables with schemas in code
Combine all collected info into one query

Something like

“From question write SQL code joining the right tables needed. Here are a bunch of example of how I joined tables in the past”
“From the question extract which tables are needed to complete this request. (Here are available tables: XXXX) Return a list.”
Run over the list you get in 2) with python and depending on which tables are needed find the relevant schema for those tables
"You’re an assistant that helps write SQL queries. Given the Question, the answer {1} and answer {3} write a full SQL query.

Obviously ideal would be to just feed the entire schema every time, but that wouldn’t work with the current token limits, but hopefully something like this could help you be more selective.

tytung2020 · August 2, 2023, 7:12am

I think semantic search will be less accurate than sql search, as it is probabilistic.

Topic		Replies	Views
Turning chatgpt API into a assistant for a (complex) website API	20	4301	December 21, 2023
How to fine tune text to sql? API	19	9149	April 26, 2024
Converting natural language to SQL query API api , open-llm	9	18794	December 18, 2023
Seeking Advice on Fine-Tuning NLP Model for Response Generation Community gpt-35-turbo	3	125	February 24, 2025
Seeking Guidance on Building a ChatGPT-Style Data Analyst Tool with Database Integration Plugins / Actions builders gpt-4 , chatgpt , api , openai	11	4630	September 23, 2024

Generate SQL queries combining prompt engineering and fine-tuning

Related topics