Summarisation of comments with a prioritisation on more common topics

I have a dataset of social media comments that I would like to create daily summaries of. Ideally, when I run this summary through chatgpt, I’m able to identify recurring themes within the comments. Here is my current prompt:


prompt = f"""
    As a professional summarizer, identify create a concise and comprehensive summary of the given Reddit comments from a particular week.

    Adhere to these guidelines: 
    1. If there are recurring themes within the comments, highlight this.
    2. Craft a summary that is detailed, thorough, in-depth, and complex, while maintaining clarity and conciseness. 
    3. Incorporate main ideas and essential information, eliminating extraneous language and focusing on critical aspects. 
    4. Rely strictly on the provided text, without including external information. 
    5. Format the summary in point form for easy readability. 
    
    
    COMMENTS: {list_of_comments}
    """

Using GPT 4, I observed that the model tries its best to include all topics in its summary.

Just an example, if there are 10 comments talking about Topic 1, and 3 comments talking about Topic 2, I would like the response to tease out this information and point out that Topic 1 was more heavily discussed.

Has anyone encountered a similar issue before? I understand that topic modelling might be a better alternative for my use case, but I’m just wondering if I could achieve a similar result just via chatgpt.

1 Like

So far you look like you’re on the right track. Your prompting is good.
Have you tried prompting/asking it which topics are the most heavily discussed?
Perhaps telling GPT to create and organize summaries by topic, taking into account which topics were more heavily discussed might help. This would also allow you to parse and chunk the data easier, allowing GPT to have more data within it’s input.
After it creates these topic-wise summaries, you can then elicit a response for an amalgamated summary of all the collected summaries thus far, pointing out which ones were the most and least discussed.

Let me know if this helps, or if you need any more details!