Semantic Text Classification and Explainer AI

bhuwanvamsi · November 15, 2024, 7:26pm

Seeking Advice: Legal Document Classification with XAI

Hi everyone,

I’m working on a project involving classification of legal documents with an emphasis on explainer AI (XAI) to build trust and interpretability for end-users, especially legal practitioners. The dataset consists of lengthy and unstructured legal texts(court rulings, arguments, etc). I’d love to hear your insights and advice on tackling some challenges I’m facing!

My Work So Far

Preprocessing

Boilerplate Removal: Filters repetitive legal jargon and irrelevant text.
Stopword Removal
NER and Lemmatization: Extracts key entities and normalizes words.
Hierarchical Chunking: Splits long documents into smaller chunks with overlaps to retain context.

Model Architecture

I’m using a LegalBERT-based classifier fine-tuned for legal text understanding:

LegalBERT: Extracts contextual embeddings.
Neural Layers:

BiLSTM + GRU: Captures sequential dependencies and contextual patterns.
GlobalMaxPooling
Dense + Dropout

Output Layer: Softmax for classifying legal categories.

Despite my efforts, I’ve only achieved an accuracy of ~50% on test data . I suspect that better preprocessing or semantic integration could help improve performance.

Challenges and Questions

Long and Unstructured Documents:

My dataset consists of lengthy, unstructured texts. Are there efficient techniques for preprocessing or segmenting such data to better capture semantics and structure?

Incorporating Semantics and Rhetorical Roles:

I’d like to integrate semantic understanding into the model and identify rhetorical roles (e.g., facts, issues, arguments) automatically. Are there any pre-trained models or frameworks you recommend for this?

Explainability:

For clear and effective explainability, are attention mechanisms, SHAP, or LIME suitable for legal contexts? Are there other XAI approaches tailored for legal document classification?

Focus

My primary focus is on improving semantic integration in the classification process.

If you’ve faced similar challenges or have insights on tools, frameworks, or strategies for legal AI projects, I’d love to hear about your experiences!

Thank you in advance for your time and support!

Topic		Replies	Views
Research on Semantic Text Segmentation Community research	13	4965	January 16, 2024
Is fine-tuning the way to go to generate legal opinions (law technical reports)? API	10	4079	December 9, 2023
How to Optimize Text Chunking for Improved Embedding Vectorization? API vector-db , semantic-search	6	9022	December 15, 2023
Multi document comparision and Q/A API gpt-4 , chatgpt , langchain , token , comparison	10	13011	June 5, 2024
Creating law ai helper Community	17	2074	April 22, 2022