Improvement ideas for simple classification?

Hi all! the problem I’m looking to solve is around categorization into pre-defined categories. Right now I’m using the search API and ranking to pick a category, but curious if anyone has any smarter suggestions.

For example, let’s say you have a long list of user-named folders like so:

animals
Animals
Aminals
pictures of animals
animals, insects, etc.
people
peoples
all the people
clocks
witches
clock times

And set categories such as: [‘animals’, ‘people’, ‘other’]

Using the search API, ranking the results, and selecting the top as the category works amazingly if the optional categories are limited. But, if I have a catchall like “other” then it gets almost nothing categorized to it. I need to start tweaking by saying “if the score isn’t above a certain amount, then it’s other” etc. things slip though the crack in real-world use cases when I start drawing that “score” line.

Anybody have any interesting ideas for a prompt-based approach to categorizing into “A”, “B”, “C” or catch-all?

1 Like

Thanks! I’m a newbie at this stuff, how do I measure precision and recall rates?
As powerful as semantic search is, I’m running into cases with name-matching where using something like Levenshtein distance is more accurate. But deep down I really think OpenAI’s search could crush at this if properly tweaked!

1 Like