Fine-tune model based on mysql schema

What’s the ideal way to create a model based on a mysql schema? For example, I’ve a really simple shopping cart application with the following schema:

customers: id, first_name,last_name,email,password,billing_streetaddress,billing_city,billing_state,billing_zip,billing_country,shipping_city,shipping_state,shipping_zip,shipping_streetaddress, shipping_country
order_items:id,order_id,product_id, quantity, price
orders:id,customer_id,date_placed,status{enum('pending','shipped','delivered')},total_amount
products:id, name, department,price, quantity,image_url, sfdcID

I’ve created fine-tune models like this:

{"prompt":"Get the list of all products with a price between $50 and $100, sorted by price in descending order ->","completion":" * from products where price between 50 and 100 order by price desc;"}
{"prompt":"Get the details of the most expensive product ->","completion":" * from products order by price desc limit 1;"}
{"prompt":"Get the list of products that are out of stock ->","completion":" * from products where quantity = 0;"}
{"prompt":"Get the list of all products in the Electronics department ->","completion":" * from products where department = 'electronics';"}
{"prompt":"What is the department of the 'Samsung Galaxy S20' product? ->","completion":" department from products where name = 'samsung galaxy s20';"}
.. {there's about another 30 more}

and like this:


{"prompt":"Generate a list of orders placed by customer ID [customer_id] ->","completion":" Order ID: [order_id], Customer ID: [customer_id], Date Placed: [date_placed], Status: [status], Total Amount: [total_amount]\n"}
{"prompt":"Generate a list of orders placed by customer ID [customer_id] that include product ID [product_id] ->","completion":" Order ID: [order_id], Customer ID: [customer_id], Date Placed: [date_placed]\n"}
{"prompt":"Generate a list of orders placed within the last month for product ID [product_id] ->","completion":" Order ID: [order_id], Customer ID: [customer_id], Date Placed: [date_placed], Status: [status], Total Amount: [total_amount]\n"}
{there's about another 30 more}

Seems like the fine-tune models w/SQL statements are more accurate - but is that the right approach? Or, should I be more thinking about better prompts?

Have you tried Codex instead of davinci-003?

The model names are different (code-davinci-002 or code-cushman-001)

Here is an example prompt.


# Table albums, columns = [AlbumId, Title, ArtistId]
# Table artists, columns = [ArtistId, Name]
# Table media_types, columns = [MediaTypeId, Name]
# Table playlists, columns = [PlaylistId, Name]
# Table playlist_track, columns = [PlaylistId, TrackId]
# Table tracks, columns = [TrackId, Name, AlbumId, MediaTypeId, GenreId, Composer, Milliseconds, Bytes, UnitPrice]

# Create a query for all albums by Adele


It is from here:

1 Like

Wow - thanks for the fast response!
Yes, I’ve tried that and the model returns returns accurate statements too. I’m trying to create an app where I don’t need to pass the schema everytime a question is asked - is that even possible? ie:pass context between the same “session”?
Thx

1 Like

The API calls have no memory of each other and I don’t think you can successfully train for table names

With that said, it may be possible with really strong reinforcement of the fine-tuning rules. I just don’t know how to do that effectively at the moment.

1 Like