Does anyone have any thoughts/experience/links on writing something to auto-tag blog posts, news articles etc? I’m keen to hear/read ideas.
I’d like automation help tagging thousands of articles.
ChatGPT did a nice job in response to:
"Give me a bunch of tags for this article… xxx
Separate the tags into different taxonomies:
- Topics
- Companies
- People"
But, from one article to another, I noticed some some variance in the exact tag name used even though it should ostensibly be the same. That would be a problem for consistent tagging.
That prompted me to look into controlled vocabularies. IPTC Media Topics is the industry’s main such vocabulary, with 1,100 terms. I asked ChatGPT to use that, but it hallucinated - it cannot directly use that vocabulary. GPT 4 chat via Playground does a better job at this, but it still hallucinates, returning codes for Media Topics but incorrect terms.
Regardless, the results ChatGPT initially gave were actually preferable than something like IPTC Media Topics. Its understanding of language made for much richer, more granular suggested terms.
This creates a dilemma… if, instead of using a controlled vocabulary, I am to ask OpenAI to help me tag a mass of articles, how can I be confident it will use a consistent vocabulary?
Can “temperature” be used to influence this? I understand temperature corresponds to variability. If temperature is reduced or turned to 0, would this increase the chance the AI would use alternative labels each time?
Practically , I guess I would ask it to return results as CSV or JSON format.
Anyone else worked on something like this?