Help with fine-tuning for text categorization

Hi everyone,

this will be a specific post, but my intention is not only to improve my results, but to post something from which others might learn. The results I’ve got were poor, but I am not just writing off fine-tuning as a process, I am trying to learn and improve.

My intention: Categorize articles to one-word category after posting them to website.

What I did: Use different cheaper GPT models to test how it will work. Started with ada, used babbage and curie after that. Ada and Babbage were trained on 4 epochs, while curie was trained on 16 epochs. I prepared 100 different real-life articles (subject, intro and first 200 characters of full text - all joined together) and used them as prompts. I put “Categorize this text” in front of the text and this was my prompt in jsonl. Completion was just name of the category, ie. “World”, “Retail”.

What was the result? Well, pretty useless actually. All models were returning some random text, like comments on the articles provided. Curie sometimes put the right category, but it was just part of the full text which did not make much sense. I might have done something wrong and will try to fix it, but I guess some help would be welcome - and it might help others as well.

What I am thinking at the moment?

  1. Number of prompts/completions was probably too low. I will try to increase them.
  2. Should I use different prompt? Would it help if I put “Categorize following text into one of these categories (World, retail, marketing): article_text” in the prompt?
  3. Should I change completion? Should I write: “Category: Category_name”, instead of just name of the category?
1 Like

Thanks for sharing your experience.

Have you considered using the embeddings API for classification?

1 Like

Yes, I will probably need to go in that direction. Not sure about fine-tuning, I am sure it is not supposed to work this way? Why make it available in that case?

Fine-tuning can be very useful for the right use case.

However, for a simple classification, it just makes more sense to use a simpler solution than fine-tuning.