Categorising astronomy news articles

Marcel-Jan · May 10, 2023, 9:57pm

For a hobby project I’m creating a Python program that gets RSS feeds from astronomy news sources and, where needed, it creates tags for each item and it picks a main topic. The idea behind it that I’ve been categorizing solar system news by hand in an Evernote document so I can quickly find things when I prepare an astronomy presentation. And now I’m automating that process.

For this I use the openai.ChatCompletion API, model gpt-3.5-turbo. Tagging the news articles with this is working really rather well. I’m asking one to six tags and they are usually good enough.

But then I ask to choose a main topic where the article belongs under, and I ask to choose from a list of main topics. I’ve tried various ways to prompt this, but often the result is:

A topic outside my list
A topic that is plain wrong (for example: an article about space station ISS gets main topic “solar science”
I’ve asked when no category fits, to choose Miscellaneous. ChatGPT rarely uses that.

I’ve also followed the excellent ChatGPT Prompt Engineering course, but I’m still getting nowhere with this. Up to a point that I’m thinking I will have to keep doing this manually instead. Because on average 60-70% of the results have to be altered.

This is the prompt I’ve used most recently:

    prompt = f"""
                You are given a title and a summary of a text. \
                The title is delimited by triple asterixes. \
                The summary is delimited by triple backticks. \
                ***{title}*** \
                ```{summary_text}``` \

                You are also given a list of topics. \
                List of topics: {astro_category_list} \

                Determine what is the main topic for this title and text. \
                
                The response should follow the format: \
                Main category: maintopic \
                and nothing else. \
                """

In this last version of the prompt I have no longer asked to choose Miscellaneous if no category fits, but I would still want ChatGPT to do that.

I’ve tried different values of temperature, but that does very little. It only seems to pick “Solar Science” as main topic even more often.

Any ideas how this could work?

Marcel-Jan · May 27, 2023, 7:24am

In my code I used to ask the main category question as second input in my message history. Now I use a fresh prompt and that works much better. About 60-70% of the chosen main topics are correct. I do some vetting, before I store the results.

This is my current prompt:

    prompt = f"""
                You are given a title and a summary of a text. \
                The title is delimited by triple asterixes. \
                The summary is delimited by triple backticks. \
                ***{title}*** \
                ```{summary_text}``` \

                You are also given a list of topics. \
                List of topics: {astro_category_list} \

                Determine what is the main topic for this title and text. \
                If you have trouble finding a good main topic, \
                instead choose this topic: \
                Miscellaneous \
                The response should follow the format: \
                Main category: maintopic \
                and nothing else. \
                """

system · May 29, 2023, 7:24am

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Summarisation of comments with a prioritisation on more common topics Prompting gpt-4 , chatgpt	1	1536	October 30, 2023
Gpt 3.5 Classified outside data labels Prompting gpt-35-turbo , api , prompt	2	1285	September 13, 2023
Force GPT 3.5 Turbo to choose an answer from a set of predefined options API	5	532	June 7, 2024
Using GPT 3.5 turbo for intent parsing for a custom chatbot Prompting gpt-35 , gpt-35-turbo , chatgpt , chatml , chatml-system	5	3468	December 27, 2023
Resolving ChatGPT hallucinations for text classification using IAB taxonomy Prompting gpt-4 , chatgpt	3	2490	July 23, 2023

Categorising astronomy news articles

Related topics