Resolving ChatGPT hallucinations for text classification using IAB taxonomy

Hi everybody. I was testing ChatGPT (both 3.5 and 4) for a simple text classification task, and what I asked it to do is to classify a text and come up with a list of IAB taxonomy categories and respected confidence weights. Meaning, for a text like this:

Creating safe AGI that benefits all of humanity

I want to get a JSON structure like this:

  "categories": [
      "category": "Technology & Computing",
      "confidence": 0.9
      "category": "Artificial Intelligence",
      "confidence": 0.8
      "category": "Science",
      "confidence": 0.6
      "category": "Education",
      "confidence": 0.5

So, it works fine for the most part, but the issue is that IAB 3.0 came up in 2022, so the model does not know about it (it only knows about previous versions of taxonomy). But also, it sometimes slightly alters the categories, and I need precise categories naming.

I tried to solve this problem using 2 tactics:

  1. Attach the full latest categories list along with the text I need to classify. Cons: the list is huge, it’s around 700 categories, so it eats up on tokens (makes the request costly) and also limits the text under classification size.
  2. Then I tried to remove the categories from the initial prompt and have a secondary prompt for categories that are not present in IAB 3.0 asking the model to map one to another. Pros: same request price concerns, however, I started caching the categories that the model already mapped, so over time, potentially, I will need the second request less and less. Cons: since it’s just a separate prompt to map categories, sometimes I’m losing value whenever the model comes up with something like “jewelry”, but they’re two separate Men’s and Woman’s jewelry in the original taxonomy - in that case, the model is forced to just guess. I am thinking of making it not a separate prompt, but an additional message within the same conversation to make the results more precise.

Sorry for the long introduction, but basically my question: since I’m very new to the whole LLM world, do I miss some better tactics or ideas that can help me in this case?

At this moment in time, the method you are using is about as good as it gets, any other method, including fine-tuning and embedding will potentially not perform as well, if you have thousands of example prompt/reply pairs you could fine tune a model, but you would then be using a lower order base model and not GPT-3.5 so… you may be able to fine tune it for new rules, but that is not the way to give it new data… prompting is the best way “so far”, Indeed there is a token cost associated with that and there may be some room for tuning the prompts, but as you have found out, this process is iterative and takes time. I hope things go well for you.


Thanks for your input! Makes sense. I just wanted to make sure I am not missing something obvious :slight_smile:

1 Like

I did something similar in 2005.

Had to classify a huge load of websites and sort them into a catalogue.

So the idea was to get the words from the website, remove all stopwords, create a list of keyword density and add that website’s keywords into a database.

And then you just need to check the similarity between websites (using synonyms and synonym phrases that were automatically found after classification).

As a base there was just a handfull of websites and the categories had to be created autmatically.
And bingo. No “ML” - just SQL.

1 Like