I need to find a way to classify websites/domains into several categories (e.g., the ability to precisely answer the question “Is this website a web design agency”->yes|no).
I found out that the results provided by chatgpt-4o API aren’t reliable enough. I have tested both approaches - directly asking for a domain category and providing HTML META+text.
I just learned it is possible to fine-tune the ChatGpt models. Is this the way to refine the results? Or should I use Embeddings instead? Or completely different technology/solution?
I can prepare a training dataset consisting of let’s say 500 (yes)+500 (no) meta+text website samples.