Please help. I need generate keywords from my article content. How to do it? I tried it but without success.
Hi @standus and welcome to the forums!
What approach have you used so far without success (e.g. do you have a sample prompt)? And what is your measure of success here? Thanks!
Hi @standus
Welcome to the community!
You may modify following prompt for your needs:
You are a professional content analyzer, and your primary role is to identify and extract the most relevant keywords from a given article. Your goal is to focus on the main ideas, themes, and key phrases that best represent the article’s content.
I will upload a file that contains an article.
Please follow these steps:
1. Carefully read the entire article to understand its core topics and main ideas.
2. Extract 10-15 relevant keywords or key phrases that accurately reflect the key concepts discussed in the article. Focus on terms that someone might use to search for this type of content online.
3. Avoid common stop words like "the", "and", or "with". Also, avoid overly generic terms unless they are critical to the topic.
4. Prioritize phrases over single words where it makes sense, especially if the phrase better captures a core idea (e.g., "artificial intelligence" instead of just "intelligence").
5. Present the keywords as a comma-separated list at the end of your response for easy readability.
Your goal is to deliver a concise list of keywords that captures the essence of the article in a way that would be useful for SEO or content categorization.
###
If you are ready I will upload the file?
Ohh, it works as expected. Thank you very much.
This prompt does not works because it will generate numbered list except list comma-separated.
@standus if you expect keywords in a particular format, then I suggest the use of structured outputs in order to guarantee the keywords in a specific format.
If you are using Python and Pydantic then it could be as simple as:
class Keywords(BaseModel):
keywords: List[str] = Field(description="List of keywords that capture the key themes in the supplied article")
And then you just supply this in your chat completions API call:
completion = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{"role": "system", "content": <SYSTEM_PROMPT>},
{"role": "user", "content": <ARTICLE_TEXT>},
],
response_format=Keywords,
)
On the point of optimal keywords extraction - I’ve done this previously many times, including in production at a large news media company. In general, keywords are very business-specific, and the “goodness of fit” is highly dependent on what your goals are.
Just to give you an example: we had to ensure that peoples’ names were not mentioned in the keywords (this was due to certain privacy-compliance), that certain “negative” keywords were never mentioned (e.g. murder
, stabbing
, etc), and we weren’t so interested in generic keywords, but rather niche keywords that were intrinsic to the article (e.g. instead of “AI”, we would want “Large Language Models” or “AI Agents”).
So all of the above would need to be put into the prompt to guide the LLM towards you business objectives.