GPT 3.5 Custom Training Challenges in Multi-Category Text Classification

I’m currently custom training for the “gpt-3.5-turbo-1106” model, focusing on insurance claim text data to predict Case Type, Case Category, and Case Sub-category.

Each Case Type, Case Category and Case Sub-category forms one Combination. Likewise I have 248 such combinations.

So, my input text falls into any one combination out of these 248 combinations.

  1. Initially, I trained the model on a limited dataset consisting of 4 combinations only, each comprising 45 records, totaling 180 records.

The achieved accuracy was 85%.

  1. Subsequently, I expanded the training scope to include data from 27 combinations, incorporating a total of 1215 records (45 records per combination).

However, the accuracy dropped significantly and it is only 45%.

My strategy includes :

preprocessing and cleansing the input text, alongside the integration of system prompts to guide the model. I introduced 27 combination names into the system prompt and conducted training over 10 epochs.

json_response = '{"Case Type": "' + case_type + '", "Case Category": "' + case_category + '","Case Sub Category": "' + sub_category + '"}'
        
fine_tuning_data.append({

"messages": [
{"role": "system", "content": f"You are a helpful assistant. Your task is to classify the text by the user into one of the following pre-defined Combinations. Each key in combinations dictionary is one combination: {combinations}"},
{"role": "user", "content": row['text']},
{"role": "assistant", "content": json_response}
]

})

My primary concerns are as follows:

  1. Achieving high accuracy remains a challenge with a larger set of combinations.

  2. Despite success with fewer combinations, maintaining accuracy with an expanded dataset, as intended for future iterative retraining, poses a significant hurdle.

3)I aim to surpass an accuracy threshold of 80%, particularly for larger combinations.

  1. However, given the current circumstances, I’m uncertain whether this goal is attainable or if a trade-off between accuracy and dataset size is inevitable.

I welcome any insights or suggestions to overcome these challenges and enhance the model’s performance across diverse combinations.

The simplest question I have is, is there a specific reason you are using GPT to undertake a text classification task like this ?

Some of the more simpler transformer or even ML model should be able to provide you with more consistent results

1 Like