Categorising astronomy news articles

For a hobby project I’m creating a Python program that gets RSS feeds from astronomy news sources and, where needed, it creates tags for each item and it picks a main topic. The idea behind it that I’ve been categorizing solar system news by hand in an Evernote document so I can quickly find things when I prepare an astronomy presentation. And now I’m automating that process.

For this I use the openai.ChatCompletion API, model gpt-3.5-turbo. Tagging the news articles with this is working really rather well. I’m asking one to six tags and they are usually good enough.

But then I ask to choose a main topic where the article belongs under, and I ask to choose from a list of main topics. I’ve tried various ways to prompt this, but often the result is:

  • A topic outside my list
  • A topic that is plain wrong (for example: an article about space station ISS gets main topic “solar science”
  • I’ve asked when no category fits, to choose Miscellaneous. ChatGPT rarely uses that.

I’ve also followed the excellent ChatGPT Prompt Engineering course, but I’m still getting nowhere with this. Up to a point that I’m thinking I will have to keep doing this manually instead. Because on average 60-70% of the results have to be altered.

This is the prompt I’ve used most recently:

    prompt = f"""
                You are given a title and a summary of a text. \
                The title is delimited by triple asterixes. \
                The summary is delimited by triple backticks. \
                ***{title}*** \
                ```{summary_text}``` \

                You are also given a list of topics. \
                List of topics: {astro_category_list} \

                Determine what is the main topic for this title and text. \
                
                The response should follow the format: \
                Main category: maintopic \
                and nothing else. \
                """

In this last version of the prompt I have no longer asked to choose Miscellaneous if no category fits, but I would still want ChatGPT to do that.

I’ve tried different values of temperature, but that does very little. It only seems to pick “Solar Science” as main topic even more often.

Any ideas how this could work?

In my code I used to ask the main category question as second input in my message history. Now I use a fresh prompt and that works much better. About 60-70% of the chosen main topics are correct. I do some vetting, before I store the results.

This is my current prompt:

    prompt = f"""
                You are given a title and a summary of a text. \
                The title is delimited by triple asterixes. \
                The summary is delimited by triple backticks. \
                ***{title}*** \
                ```{summary_text}``` \

                You are also given a list of topics. \
                List of topics: {astro_category_list} \

                Determine what is the main topic for this title and text. \
                If you have trouble finding a good main topic, \
                instead choose this topic: \
                Miscellaneous \
                The response should follow the format: \
                Main category: maintopic \
                and nothing else. \
                """

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.