I have started toying with GPT-3 to generate some SQL code based on questions. This might be useful for tech support, but it is also a easy way to get started with code generation (I am mostly interested in C# code generation, but it is somewhat harder).
I tried the playground and it seems to generate interesting results. Example below:
Context
Q: select all records from the employee table
A: SELECT * FROM Employee
###
Q: select the latest 10 records from the employee table
A: SELECT TOP 10 * FROM Employee ORDER BY id DESC
###
Q: select first name and last name for the latest 20 employees in the employee table
A: SELECT TOP 20 FirstName, LastName FROM Employee ORDER BY id DESC
###
Q: select full name for the first 30 employees on the employee table
A: SELECT TOP 30 FirstName + ' ' + LastName FROM Employee ORDER BY id DESC
###
Q: check if there are records having the same LastName in the employee table
A: SELECT LastName, COUNT(*) AS Cnt FROM Employee HAVING COUNT(1) > 1
###
Q: select from employee table where login contains 'alex'
A: SELECT * FROM Employee WHERE login LIKE 'alex%'
###
Q: get employee count grouped by last name
A: SELECT LastName, COUNT(*) AS Cnt FROM Employee GROUP BY LastName
###
Q: create a delete snippet for table Foo
A: begin tran
select * from Foo
delete from Foo ...
select * from Foo
rollback
###
My question and the generated answer:
Q: create a delete snippet for table Employee where login contains 'alex'
A: begin tran
select * from Employee where login LIKE 'alex%'
delete from Employee where login LIKE 'alex%' ...
select * from Employee
rollback
I am happy with the answer especially given the small number of “learning material” (GPT understands quite a lot of context from little information I have provided).
However, using the Q&A API does not scale as I expect lots of tokens when knowledge piles up. I have checked the suggestion to store the context as a document from this thread.
I have generated a JSONL file that contains the playground context and uploaded it:
{"text": "To select all records from the employee table run: select all records from the employee table", "metadata": "SQL Server select all"}
{"text": "To select the latest 10 records from the employee table: SELECT TOP 10 * FROM Employee ORDER BY id DESC", "metadata": "SQL Server select top"}
{"text": "To select first name and last name for the latest 20 employees in the employee table run: SELECT TOP 20 FirstName, LastName FROM Employee ORDER BY id DESC", "metadata": "SQL Server select top records"}
{"text": "To select full name for the first 30 employees on the employee table run: SELECT TOP 30 FirstName + ' ' + LastName FROM Employee ORDER BY id DESC", "metadata": "SQL Server select top with concatenation"}
{"text": "To select from employee table where login contains 'alex' run: SELECT * FROM Employee WHERE login LIKE 'alex%'", "metadata": "SQL Server select with like"}
{"text": "To get employee count grouped by last name run: SELECT LastName, COUNT(*) AS Cnt FROM Employee GROUP BY LastName", "metadata": "SQL Server group with count"}
{"text": "To get employee count grouped by last name run: SELECT LastName, COUNT(*) AS Cnt FROM Employee GROUP BY LastName", "metadata": "SQL Server group with count"}
{"text": "To create a delete snippet for table Foo run: begin tran###select * from Foo###delete from Foo###...###select * from Foo###rollback", "metadata": "SQL Server delete snippet"}
I have used answers
endpoint to use this file. The request and the answer look like the following:
{
"model": "curie",
"question": "create a delete snippet for employee where login contains 'alex'",
"examples": [["select all records from the employee table run","SELECT * FROM Employee"]],
"examples_context": "To select all records from the employee table run: SELECT * FROM Employee",
"file": "file-E21nOsuh5E3SHbN9gB7qYMTS",
"search_model": "ada",
"max_tokens": 256,
"stop": ["\n", "<|endoftext|>"]
}
and the response:
{
"answers": [
"begin tran###select * from Employee###delete from Employee###...###select * from Employee###rollback"
],
"completion": "cmpl-3APyhhiMDOkF9gHpP8Uupp10b6DPE",
"file": "file-E21nOsuh5E3SHbN9gB7qYMTS",
"model": "curie:2020-05-03",
"object": "answer",
"search_model": "ada:2020-05-03",
"selected_documents": [
{
"document": 7,
"object": "search_result",
"score": 98.991,
"text": "To get employee count grouped by last name run: SELECT LastName, COUNT(*) AS Cnt FROM Employee GROUP BY LastName"
},
{
"document": 6,
"object": "search_result",
"score": 98.991,
"text": "To get employee count grouped by last name run: SELECT LastName, COUNT(*) AS Cnt FROM Employee GROUP BY LastName"
},
{
"document": 3,
"object": "search_result",
"score": 102.692,
"text": "To select the latest 10 records from the employee table: SELECT TOP 10 * FROM Employee ORDER BY id DESC"
},
{
"document": 5,
"object": "search_result",
"score": 102.992,
"text": "To select first name and last name for the latest 20 employees in the employee table run: SELECT TOP 20 FirstName, LastName FROM Employee ORDER BY id DESC"
},
{
"document": 4,
"object": "search_result",
"score": 104.351,
"text": "To select full name for the first 30 employees on the employee table run: SELECT TOP 30 FirstName + ' ' + LastName FROM Employee ORDER BY id DESC"
},
{
"document": 2,
"object": "search_result",
"score": 114.151,
"text": "To select all records from the employee table run: select all records from the employee table"
},
{
"document": 1,
"object": "search_result",
"score": 235.965,
"text": "To create a delete snippet for table Foo run: begin tran###select * from Foo###delete from Foo###...###select * from Foo###rollback"
},
{
"document": 0,
"object": "search_result",
"score": 340.767,
"text": "To select from employee table where login contains 'alex' run: SELECT * FROM Employee WHERE login LIKE 'alex%'"
}
]
}
So, the answer is: “begin tran###select * from Employee###delete from Employee###…###select * from Employee###rollback”.
That’s a pretty decent answer, but I am wondering about the difference. Can I tweak the model so that the answer endpoint returns similar results to the Q&A API (used in the playground).
Both calls use curie engine with temperature = 0.
Any improvement ideas?
Thanks.