A few approaches.
First the “easy one”, but may not work entirely …
Train on the “None” category, so create a "category": " 0", and send all your uncategorized training data to this.
The only problem, there isn’t anything specific or keyword-wise the AI can learn from. So there are doubts this will work, but it’s worth a try.
Another option, possibly more reliable, is using the Log Probabilities reported out for your already trained categories, and if they score high for the respective category, then use the category, otherwise consider it non-categorized.
For example (in Python):
e = 2.718281828459
# get the log probability (base e)
LogProb = FullReaction["choices"][0]["logprobs"]["token_logprobs"][0]
print(f"LogProb: {LogProb}")
# convert the log probability to a probability in [0,1]
Prob = e**LogProb
print(f"Prob: {Prob}") # if this is > 0.8 (or something high), then use it.
You can of course do both, i.e., train on ’ 0’ and use log_probs.
For a binary classifier, you would have parameters like so:
{"temperature": 0, "max_tokens": 1, "top_p": 1, "logprobs": 2, "frequency_penalty": 0, "presence_penalty": 0}
You need to call out “logprobs” in the API call to get them out. However, the max number of values that it will output is the top 5. If you look at these, for example, you will get the ranking of “confusion” for the top 5 tokens, or up to your top 5 categories if you set "logprobs": 5 in your API call.
Here is when I sent "logprobs": 2
"logprobs": {"tokens": [" 1"], "token_logprobs": [-0.07227323], "top_logprobs": [{" 1": -0.07227323, " 0": -2.6639233}], "text_offset": [42]}
You can also go with "logprobs": 2, like me, even for the multi-class case, but you will see the top 2 predicted tokens. The slight risk here is that the top 2 tokens may not map to your intended tokens, so going higher will catch these scenarios.
Lastly, if none of this works, you could train a binary classifier, and use it up front, where the training is ' 0' for things that are off topic, and ' 1' for things that are on topic. You would use logprobs here too to get an assurance of how certain the prediction was before proceeding.
Then only send the on-topic things to your current classifier. This filtering up-front is usually better, since it simplifies your logic downstream. But you could do this and all of the above to create a robust classification system.