Seeking Advice on Fine-Tuning NLP Model for Response Generation

Hello community members, I have been working on fine-tuning an NLP model for SQL generation, but I am encountering issues with it not returning the expected output based on my training data. I have spent time and money on two attempts at fine-tuning, with no success. I have validated my dataset and training process, but I am at a loss for why the model is not performing as expected. Any insights, tips, or best practices for fine-tuning NLP models would be greatly appreciated. Thank you in advance for your help!

Welcome to the community forums!

Do you wanna run us through your process (model selection, training examples, inference prompt) so someone here might be able to give specific feedback?

In my experience you can get extremely far with prompts and dynamic prompt generation, but it really depends on what you’re doing :slight_smile:

Thank you for the warm welcome and for offering to help!

Here’s an overview of my process:

  1. Model Selection:
  • I used OpenAI’s GPT-3.5 Turbo model for fine-tuning, as the task requires converting natural language queries into SQL commands while incorporating organizational schema and domain-specific terminology. Fine-tuning was necessary due to token limitations when including detailed schema information in prompts during inference.
  1. Training Examples:
  • My training dataset includes JSONL records structured as follows:
{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful, cheerful database assistant. (I put here my Database schema with their constraints) Include column name headers in the query results. Always provide your answer in the JSON format below: { 'summary': 'your-summary', 'query': 'your-query' } Output ONLY JSON formatted on a single line. Do not use new line characters. In the preceding JSON response, substitute 'your-query' with Microsoft SQL Server Query to retrieve the requested data. In the preceding JSON response, substitute 'your-summary' with an explanation of each step you took to create this query in a detailed paragraph. Do not use MySQL syntax. Always limit the SQL Query to 100 rows. Always include all of the table columns and details. Here are the some alias DailyReportProject = DRP DailyReport = DR"
    },
    {
      "role": "user",
      "content": "Get projects."
    },
    {
      "role": "assistant",
      "content": "{\"summary\":\"Retrieve up to 10 projects with their project revision, excluding deleted entries.\",\"query\":\"SELECT TOP 10 PRO.ProjectId, PRO.ProjectNum, PRO.LatestProjectRevisionId, PRO.QcatProjectId, PRO.LastPartsQualityReportRun, PRO.BusinessAreaId, PRO.RiskIndex, PRO.InitialLiabilityTypeId, PRO.OnHold, PRO.FinalInvoiceSentByEmployeeId, PRO.FinalInvoiceSentOn, PR.ProjectRevisionId, PR.RevisionNum, PR.ProjectRevisionStatusId, PR.ProjectUpRevedOn, PR.StartDate, PR.EndDate, PR.InspectionEndDate, PR.InspectionEndByEmployeeId, PR.ForecastEndDate, PR.ProjectOwnershipTypeId, PR.POTypeId, PR.Vin, PR.GPCategoryTypeId, PR.ProjectPartiesRelationshipTypeId, PR.ServicePillarCategoryTypeId, PR.ProjectTypeId, PR.ProjectChargingMethodId, PR.GPQuotationNumber, PR.GPEstimateNumber, PR.CustomerReference, PR.SupplierSiteId, PR.CustomersSupplierReference, PR.CustomerId, PR.CustomerGroupContactId, PR.LocationCustomerId, PR.LocationCustomerGroupContactId, PR.HostCustomerId, PR.HostCustomerGroupContactId, PR.ReportingFrequencyTypeId, PR.BreakpointTotalNumberOfHours, PR.BreakpointCeasationDate, PR.BreakpointQuantityOfParts, PR.BreakpointNetInvoiceValue, PR.BreakpointBatchIdNumber, PR.BreakpointOther, PR.TipLevel, PR.ReviewDate, PR.ReviewFrequency, PR.InvoiceFrequencyId, PR.ProjectReviewStatusId, PR.ProjectDescription, PR.TabComplete, PR.IsBatchNoVisible, PR.IsBatchNoManadatory, PR.IsDeliveryNoteNoVisible, PR.IsDeliveryNoteNoManadatory, PR.IsDeliveryNoteDateVisible, PR.IsDeliveryNoteDateManadatory, PR.IsProductionDateVisible, PR.IsProductionDateManadatory, PR.Custom1Name, PR.IsCustom1Visible, PR.IsCustom1Manadatory, PR.Custom2Name, PR.IsCustom2Visible, PR.IsCustom2Manadatory, PR.ToolingRequirementFreeText, PR.SubmittedDate, PR.SubmittedByEmployeeId, PR.IndustryTypeId, PR.EmergencyPlantContact, PR.ReasonForChange, PR.ChangeRequestedBy FROM Project PRO INNER JOIN ProjectRevision PR ON PRO.LatestProjectRevisionId = PR.ProjectRevisionId AND PR.SysDeleted = 0 WHERE PRO.SysDeleted = 0;\"}"}"
    }
  ]
}
  • I validated the dataset to ensure proper formatting and added diverse examples covering basic queries, joins, filters, and edge cases.
  1. Inference Prompt:
  • After fine-tuning, I used the following API call during inference:
{
  "model": "ft:gpt-3.5-turbo-0125:personal::your-model-id",
  "messages": [
    { "role": "user", "content": "Get projects" }
  ]
}
  • Unfortunately, the responses often did not align with the expected SQL structure or context provided in the training data.
  1. Current Issues:
  • Despite ensuring data quality and aligning the structure with OpenAI’s guidelines, the fine-tuned model isn’t performing as expected. It often returns generic responses or fails to correctly incorporate schema details.
  1. Reason for Fine-Tuning:
  • My task involves converting user queries into SQL based on an organization’s specific schema and terms. While prompts work well, including schema and terminology in the prompt frequently exceeds token limits, making fine-tuning seem like a logical solution.

Do you think there might be issues with how the training data is structured or the way the model is prompted post-fine-tuning? Would dynamic prompts be a better approach, even with token limitations? I’d love to hear any suggestions or tips based on your experience!