How to further improve Product Categorization Task?

I have a task of categorizing product into one of ~5000 category.

So far I have achieved approx 80% accuracy with prompt engineering using gpt-3.5-turbo (can’t use gpt4 due to cost limitation).

I required the accuracy to be 90% for productization.

Those that is not accurate, the category picked by model isnt wrong, but there are just a better category.

For example, For this product Description
“”"
Serum Anticycrices to Repair Scars, Serum for Tattoos of eyebrows, Healthy Biological Active Ingredient to Repair Scars, Serum to Remove Tattoos
“”"

The model has chosen Scar Removal, but there is a better category Tattoo Aftercare

I have tried fine tuning with 2000 datapoints but the performance seemed turned worst.

Any idea to further try out? Any successful example / paper that anyone can share?

Welcome. Switching this to prompting category.

Can you share your system message with us? What temperature are you using?

Fine-tune data example?

Why would that be a "better* category?

The description says “repair scars” twice (scar removal), “serum for tattoos” (tattoo aftercare), and “serum to remove tattoos” (not tattoo aftercare but tattoo removal which given the two “repair scars” could be construed as a type of “scar removal”). Also, the description mentions “scars” before tattoos.

If I, as a human (I swear), were given this description and those two options as categories I would almost certainly clarify the product under “scar removal.”

So, if the classification is wrong, it’s either a bad description of the product by someone SEOing the description or the product itself has multiple uses and this could fall into multiple categories.

You could try to have the model perform the classifications as mixed-memberships, giving each product 2 or 3 classes. Then, in a second pass have it re-evaluate its choices and rank whatever remains.

5,000 classes is a lot of classes though, I doubt the model will be able to effectively classify into that many discrete classes in a single shot. You’ll also never be able to fine-tune a classification model with 5,000 classes on 2,000 examples.

My suggestion would be to do this,

  1. Cluster your classes. If you have 5,000 classes, try to convert that into something like 100 groups of 50 classes. Note: the groups don’t need to have an equal number of members, this is just an example.
  2. Do a hierarchical clustering. First identify the group the product belongs to, then send the description back through a second time with only the classes in that group as options.

This will likely even reduce your costs because, instead of sending 5,000 classes in the first pass you send \sim100 groups. Removing 98\% of the number of classes will be a huge reduction in tokens so, as long as the product descriptions aren’t huge I expect your total token count will drop a ton.

In this case maybe these two classes would be in the “skincare” super-class.

The model would almost certainly identify this description as being for a skincare product, then it would only need to do a second pass to get the final category.

Another which will help, number the categories and ask the model to return the number of the category.

Let’s say, “tattoo aftercare” is number 38 and “scar removal” is number 53.

So, the model is now returning 53, great.

Next, set max_tokens to 1. Now the response will always end immediately after the number returned, guaranteed to save costs!

Finally, instead of asking the model to do any kind of mixed-membership classification, we can get those ourselves! Turn on logprobs, set top_logprobs to 5, and see the categories the model thinks are the top-5 most likely! The best parts about this are,

  1. It’s free!
  2. It’s much more accurate than asking the model to do it.

Now you have its top five choices and some idea of how confident it is in those choices!

Note that the model may occasionally want to choose something other than the number to start, you can just ignore those.

2 Likes