Fine-tune model based on mysql schema

Jack5 · January 6, 2023, 4:29pm

What’s the ideal way to create a model based on a mysql schema? For example, I’ve a really simple shopping cart application with the following schema:

customers: id, first_name,last_name,email,password,billing_streetaddress,billing_city,billing_state,billing_zip,billing_country,shipping_city,shipping_state,shipping_zip,shipping_streetaddress, shipping_country
order_items:id,order_id,product_id, quantity, price
orders:id,customer_id,date_placed,status{enum('pending','shipped','delivered')},total_amount
products:id, name, department,price, quantity,image_url, sfdcID

I’ve created fine-tune models like this:

{"prompt":"Get the list of all products with a price between $50 and $100, sorted by price in descending order ->","completion":" * from products where price between 50 and 100 order by price desc;"}
{"prompt":"Get the details of the most expensive product ->","completion":" * from products order by price desc limit 1;"}
{"prompt":"Get the list of products that are out of stock ->","completion":" * from products where quantity = 0;"}
{"prompt":"Get the list of all products in the Electronics department ->","completion":" * from products where department = 'electronics';"}
{"prompt":"What is the department of the 'Samsung Galaxy S20' product? ->","completion":" department from products where name = 'samsung galaxy s20';"}
.. {there's about another 30 more}

and like this:


{"prompt":"Generate a list of orders placed by customer ID [customer_id] ->","completion":" Order ID: [order_id], Customer ID: [customer_id], Date Placed: [date_placed], Status: [status], Total Amount: [total_amount]\n"}
{"prompt":"Generate a list of orders placed by customer ID [customer_id] that include product ID [product_id] ->","completion":" Order ID: [order_id], Customer ID: [customer_id], Date Placed: [date_placed]\n"}
{"prompt":"Generate a list of orders placed within the last month for product ID [product_id] ->","completion":" Order ID: [order_id], Customer ID: [customer_id], Date Placed: [date_placed], Status: [status], Total Amount: [total_amount]\n"}
{there's about another 30 more}

Seems like the fine-tune models w/SQL statements are more accurate - but is that the right approach? Or, should I be more thinking about better prompts?

raymonddavey · January 6, 2023, 4:43pm

Have you tried Codex instead of davinci-003?

The model names are different (code-davinci-002 or code-cushman-001)

Here is an example prompt.

# Table albums, columns = [AlbumId, Title, ArtistId]
# Table artists, columns = [ArtistId, Name]
# Table media_types, columns = [MediaTypeId, Name]
# Table playlists, columns = [PlaylistId, Name]
# Table playlist_track, columns = [PlaylistId, TrackId]
# Table tracks, columns = [TrackId, Name, AlbumId, MediaTypeId, GenreId, Composer, Milliseconds, Bytes, UnitPrice]

# Create a query for all albums by Adele

It is from here:

github.com

MicrosoftDocs/azure-docs/blob/main/articles/cognitive-services/openai/how-to/work-with-code.md

---
title: 'How to use the Codex models to work with code'
titleSuffix: Azure OpenAI
description: Learn how to use the Codex models on the Azure OpenAI Service to handle a variety of coding tasks
services: cognitive-services
manager: nitinme
ms.service: cognitive-services
ms.subservice: openai
ms.topic: how-to
ms.date: 06/24/2022
author: ChrisHMSFT
ms.author: chrhoder
keywords: 
---

# Codex models and Azure OpenAI

The Codex model series is a descendant of our GPT-3 series that's been trained on both natural language and billions of lines of code. It's most capable in Python and proficient in over a dozen languages including C#, JavaScript, Go, Perl, PHP, Ruby, Swift, TypeScript, SQL, and even Shell.

You can use Codex for a variety of tasks including:

This file has been truncated. show original

Jack5 · January 6, 2023, 6:15pm

Wow - thanks for the fast response!
Yes, I’ve tried that and the model returns returns accurate statements too. I’m trying to create an app where I don’t need to pass the schema everytime a question is asked - is that even possible? ie:pass context between the same “session”?
Thx

raymonddavey · January 6, 2023, 6:31pm

The API calls have no memory of each other and I don’t think you can successfully train for table names

With that said, it may be possible with really strong reinforcement of the fine-tuning rules. I just don’t know how to do that effectively at the moment.

Topic		Replies	Views
How to fine tune text to sql? API	19	8505	April 26, 2024
I feel like my tuning model isn't "learning" API fine-tuning	3	505	September 29, 2023
Natural Language to SQL with huge table schema API	12	6908	December 19, 2023
Training data not getting desired fine tune output API fine-tuning	6	539	July 29, 2023
How to train Codex on a complex SQL legacy database model API codex	5	1936	December 19, 2023

Fine-tune model based on mysql schema

Related topics