Document Tagging 4o vs 4o-mini

Dmarx · December 12, 2024, 3:14pm

Hi,

I’m working on a tool that tags blog posts, emails and articles.

I’m NOT providing a pre-set list of tags, but instead asking the LLM to generate 10 - 20 tags/article based on the major topic/themes.

Curious if anyone has done something similar and found any major difference using 4o vs. 4o-mini for tagging?

Thanks!

merefield · December 12, 2024, 3:37pm

Some significant experience with this as I wrote the first AI plugin for Discourse. This implemented topic summarisation and includes a smart (LLM-based) tagging feature.

Some conclusions from that experience:

lack of a confined list will likely result in your tag set becoming too large and too overlapping over time ending up with no discrete categories and too many synonyms - a mess.
Prompting with a defined set works well
the best results come from prompting with a Completion, not using embedding and semantic similarity (surprisingly), but that’s not what you asked.
gpt4-turbo is way better than the 4o series at this (unfortunately it’s much more expensive too, but you really get what you pay for in this task)

I’ve been considering distilling gpt4-turbo responses into a fine tuned mini but the distilling toolset is a bit raw and incomplete atm.

Topic		Replies	Views
Optimum model for tagging articles? API	0	530	May 31, 2023
Auto-tagging articles - any thoughts? API	13	4693	May 31, 2023
Limits and limits and limits API	2	1451	May 31, 2021
How good is Davinci with Text Classification? API	7	1381	July 22, 2021
Help with fine-tuning for text categorization API	4	1334	December 16, 2023

Document Tagging 4o vs 4o-mini

Related topics