Incorrect output and syntax errors after fine-tuning davinci for text to SQL conversion

Hi everyone, I am using text-davinci-003 model to convert natural language into SQL. I have fine tuned the davinci base model with 50 prompt-completion pairs which have query and postgresql output respectively. The fine-tuned model gives incorrect output and throws errors as well. I also tried with the same model by training with empty prompts but did not get any luck. Any suggestions would be really helpful to get an improvement in my results. Thanks!

Can you provide a snippet of training data?

You may be better off using babbage or Ada.
I’m currently using Ada with a little over 1,000 pieces of training/validation data and it’s fairly successful with it’s queries.

Hi @RonaldGRuckus, Thank you for your reply.

  1. Babbage and Ada would be better than davinci in converting text to SQL?
  2. What were the specific use cases for which you re-trained? I want to understand for which queries the model fails.
  3. Did you create the 1000 data points manually?
    I have attached a snippet of my training data.

Not better, just much more affordable and with the same results. You may need extra training but the payoffs (imo) would be worth it.

The failures are my fault. Too little data in certain fields such as negations.

I used GPT-3.5 to create natural queries and then painstakingly wrote the answers myself for a couple hundred.

The training data looks alright. Do you have a validation set? 50 samples for what looks to be a pretty heavy database is way lower than what’s recommended.

I’d 100% recommend using

I started with a couple hundred samples, and using the graphs I was able to confidently keep going & trying different epochs until I reached a satisfactory validation sequence accuracy.

Hi @RonaldGRuckus , So the takeaway is I should increase my data samples. Thank you, will explore further.

Usually that’s the way to go assuming the training data is still fresh and not causing overfitting.

You could also try increasing the epochs and seeing how it reflects in the validation sequence accuracy (in this case the most important)