Using GPT-3 answer endpoint and "context" for code generation

alexd · June 14, 2021, 4:25pm

I have started toying with GPT-3 to generate some SQL code based on questions. This might be useful for tech support, but it is also a easy way to get started with code generation (I am mostly interested in C# code generation, but it is somewhat harder).

I tried the playground and it seems to generate interesting results. Example below:

Context

Q: select all records from the employee table
A: SELECT * FROM Employee
###
Q: select the latest 10 records from the employee table
A: SELECT TOP 10 * FROM Employee ORDER BY id DESC
###
Q: select first name and last name for the latest 20 employees in the employee table
A: SELECT TOP 20 FirstName, LastName FROM Employee ORDER BY id DESC 
###
Q: select full name for the first 30 employees on the employee table
A: SELECT TOP 30 FirstName + ' ' + LastName FROM Employee ORDER BY id DESC
###
Q: check if there are records having the same LastName in the employee table
A: SELECT LastName, COUNT(*) AS Cnt FROM Employee HAVING COUNT(1) > 1
###
Q: select from employee table where login contains 'alex'
A: SELECT * FROM Employee WHERE login LIKE 'alex%'
###
Q: get employee count grouped by last name
A: SELECT LastName, COUNT(*) AS Cnt FROM Employee GROUP BY LastName
###
Q: create a delete snippet for table Foo
A: begin tran
select * from Foo
delete from Foo ...
select * from Foo
rollback
###

My question and the generated answer:

Q: create a delete snippet for table Employee where login contains 'alex'
A: begin tran
select * from Employee where login LIKE 'alex%'
delete from Employee where login LIKE 'alex%' ...
select * from Employee
rollback

I am happy with the answer especially given the small number of “learning material” (GPT understands quite a lot of context from little information I have provided).

However, using the Q&A API does not scale as I expect lots of tokens when knowledge piles up. I have checked the suggestion to store the context as a document from this thread.

I have generated a JSONL file that contains the playground context and uploaded it:

{"text": "To select all records from the employee table run: select all records from the employee table", "metadata": "SQL Server select all"}
{"text": "To select the latest 10 records from the employee table: SELECT TOP 10 * FROM Employee ORDER BY id DESC", "metadata": "SQL Server select top"}
{"text": "To select first name and last name for the latest 20 employees in the employee table run: SELECT TOP 20 FirstName, LastName FROM Employee ORDER BY id DESC", "metadata": "SQL Server select top records"}
{"text": "To select full name for the first 30 employees on the employee table run: SELECT TOP 30 FirstName + ' ' + LastName FROM Employee ORDER BY id DESC", "metadata": "SQL Server select top with concatenation"}
{"text": "To select from employee table where login contains 'alex' run: SELECT * FROM Employee WHERE login LIKE 'alex%'", "metadata": "SQL Server select with like"}
{"text": "To get employee count grouped by last name run: SELECT LastName, COUNT(*) AS Cnt FROM Employee GROUP BY LastName", "metadata": "SQL Server group with count"}
{"text": "To get employee count grouped by last name run: SELECT LastName, COUNT(*) AS Cnt FROM Employee GROUP BY LastName", "metadata": "SQL Server group with count"}
{"text": "To create a delete snippet for table Foo run: begin tran###select * from Foo###delete from Foo###...###select * from Foo###rollback", "metadata": "SQL Server delete snippet"}

I have used answers endpoint to use this file. The request and the answer look like the following:

{
    "model": "curie",
    "question": "create a delete snippet for employee where login contains 'alex'",
    "examples": [["select all records from the employee table run","SELECT * FROM Employee"]],
    "examples_context": "To select all records from the employee table run: SELECT * FROM Employee",
    "file": "file-E21nOsuh5E3SHbN9gB7qYMTS",
    "search_model": "ada",
    "max_tokens": 256,
    "stop": ["\n", "<|endoftext|>"]
}

and the response:

{
  "answers": [
    "begin tran###select * from Employee###delete from Employee###...###select * from Employee###rollback"
  ],
  "completion": "cmpl-3APyhhiMDOkF9gHpP8Uupp10b6DPE",
  "file": "file-E21nOsuh5E3SHbN9gB7qYMTS",
  "model": "curie:2020-05-03",
  "object": "answer",
  "search_model": "ada:2020-05-03",
  "selected_documents": [
    {
      "document": 7,
      "object": "search_result",
      "score": 98.991,
      "text": "To get employee count grouped by last name run: SELECT LastName, COUNT(*) AS Cnt FROM Employee GROUP BY LastName"
    },
    {
      "document": 6,
      "object": "search_result",
      "score": 98.991,
      "text": "To get employee count grouped by last name run: SELECT LastName, COUNT(*) AS Cnt FROM Employee GROUP BY LastName"
    },
    {
      "document": 3,
      "object": "search_result",
      "score": 102.692,
      "text": "To select the latest 10 records from the employee table: SELECT TOP 10 * FROM Employee ORDER BY id DESC"
    },
    {
      "document": 5,
      "object": "search_result",
      "score": 102.992,
      "text": "To select first name and last name for the latest 20 employees in the employee table run: SELECT TOP 20 FirstName, LastName FROM Employee ORDER BY id DESC"
    },
    {
      "document": 4,
      "object": "search_result",
      "score": 104.351,
      "text": "To select full name for the first 30 employees on the employee table run: SELECT TOP 30 FirstName + ' ' + LastName FROM Employee ORDER BY id DESC"
    },
    {
      "document": 2,
      "object": "search_result",
      "score": 114.151,
      "text": "To select all records from the employee table run: select all records from the employee table"
    },
    {
      "document": 1,
      "object": "search_result",
      "score": 235.965,
      "text": "To create a delete snippet for table Foo run: begin tran###select * from Foo###delete from Foo###...###select * from Foo###rollback"
    },
    {
      "document": 0,
      "object": "search_result",
      "score": 340.767,
      "text": "To select from employee table where login contains 'alex' run: SELECT * FROM Employee WHERE login LIKE 'alex%'"
    }
  ]
}

So, the answer is: “begin tran###select * from Employee###delete from Employee###…###select * from Employee###rollback”.

That’s a pretty decent answer, but I am wondering about the difference. Can I tweak the model so that the answer endpoint returns similar results to the Q&A API (used in the playground).

Both calls use curie engine with temperature = 0.

Any improvement ideas?

Thanks.

feeblemortal · February 18, 2022, 2:34pm

Curie is great for text based answers. There are 2 other engines that are fine tuned for code answers which are code-davinci-001 and code-cushman-001. With curie you can get code looking answers but they get pretty messy due to not being fine tuned for code.

Do you have access to the codex beta? If not then you will not have access to these engines. You can join the wait-list here: OpenAI Codex

Topic		Replies	Views
OpenAI and StackOverflow Prompting	3	2133	September 17, 2023
Fine-tuning 3.5 turbo to act as conversational AI like Non-Playable Character in games API fine-tuning	4	1544	October 4, 2023
Large codebase as knowledge for GPT-4-Turbo API gpt-4 , api	7	4572	February 17, 2024
Turning chatgpt API into a assistant for a (complex) website API	20	3885	December 21, 2023
How to use my codebase for code completion? API	5	3513	April 27, 2023

Using GPT-3 answer endpoint and "context" for code generation

Related topics