Text to SQL generation

I try to fine tune model which give m text to SQL.I try to generalized model which work on any table schema and return m proper postgre SQL query.
Such that the model would then recognize all the tables that belong in the schema. So when I prompt for a SQL query in that schema, the model would know what subset of tables.
I know Example SQL translate. But I want to Generalized model. can any one give me suggestion what the prompt and completion Values need to fine tune.
and how much data we required to make perfect model


Very interesting question. Actually this is the focus of my work, more or less.
I assume you have seen this: GPT-3 Natural Language to SQL is impressive! - YouTube

I have tried several experiments with this approach to generate better prompts using attached lexical information for the underlying schema. Found the general ambiguity problem difficult however. I assume you mean $m$ possible solutions. If not please clarify.

Anyways the system I am using for this is on the AWS free tier. Launching Free-Tier C-Phrase on AWS - YouTube

Pretty straight forward to experiment with alternative prompts.

1 Like

I have built a streamlit app to help me experiment with GPT-3 for the purpose of generating SQL from text. I also received Codex private beta invite, hope to see its difference with text-davinci-002 model.

I also join waitlist
waiting for reply

I have done quite a few testing in this area (text to SQL) recently.
IMHO, based on the output that I got in my many testing, text-davinci-002 is better than codex.
Codex is probably better in writing different programming languages.

interesting feedback, I see text-davinci-002 is very good too. What SQL flavor did you evaluate?

Its correct
text devinci 002 is better

In my use case, it’s all select statement (query) but not insert,update, or delete.

Same I am also using select staement to accessing database

Other SQLs like delete, update, create, insert work well.

Very impressed by the following prompt/response

Table invoices, columns=[InvoiceId, CustomerId, InvoiceDate, BillingState, Total]
Table invoice_items, columns=[InvoiceId, TrackId, UnitPrice, Quantity]
Table albums, columns = [AlbumId, Title, ArtistId]
Table tracks, columns = [TrackId, Name, AlbumId]
Table artists, columns = [ArtistId, Name]
Create a SQLite query for top 5 artists whose albums were sold the most in last year

This generated query is embedded in python, but valid out-of-the-box

SELECT artists.Name Artist, COUNT(*) AlbumsSold 
FROM invoices 
INNER JOIN invoice_items ON invoices.InvoiceId = invoice_items.InvoiceId 
INNER JOIN tracks ON invoice_items.TrackId = tracks.TrackId 
INNER JOIN albums ON tracks.AlbumId = albums.AlbumId 
INNER JOIN artists ON albums.ArtistId = artists.ArtistId 
WHERE invoices.InvoiceDate BETWEEN '2011-01-01' AND '2012-01-01' 
GROUP BY artists.Name 

it works very nicely

There is clearly a lot of potential here and it’s just the tip of the iceberg.

If anyone is interested in joining a team that is building in this space, please hit me up.



I am interested
pls share

I signed in airops.com, it seems that your company has created many productivity tools, very interesting