I am trying to build a multi-class classifier of sentences. There are 4 classes today. Lets call them A, B, C, D. Some sentences may not fall into either of the categories and it is very important that the classifier labels them as “other”.
I’ve tried an approach where I would set a threshold logprob for a label, and if classifier returns a value below it - override the classification as “other”, but this did’t work very well and I began getting a lot of false negatives.
If we think of “other” as of any other label - then we would need to include examples of “other” sentences into the finetuning set. However, I was hoping that since gpt-3.5 is an LLM and it is already trained on a large amount of text, it should be able to have an internal representation of “other” already.
Is my intuition correct here? If yes, how do I finetune a model that would handle “other” class well?