How to extract highlights or key points from podcast?

Hi everyone,

I’m looking for a way to extract highlights or key points from a long podcast transcript in SRT format.

1
00:00:00,000 → 00:00:05,000
This is the first subtitle.

2
00:00:05,000 → 00:00:10,000
This is the second subtitle.

3
00:00:10,000 → 00:00:15,000
This is the third subtitle.

The desired result : [“00:00:00,000”, “00:00:00,000”, “key point”]

I’ve tried ChatGPT, but I found it to be inaccurate, often goes off-topic, and only covers a small part of the transcript of the podcast

Hi and welcome to the developer forum!

Can you give some examples of the undesired output? if you can provide example input and the prompt used and the response generated, that would help a lot.

2 Likes

the undesired output:

  • ChatGPT didn’t respect the duration I provided. I asked for 30-220 seconds of clips, but it only gave me 7 or 2 seconds.
  • ChatGPT sometimes goes outside the transcript boundaries. For example, I gave it a 3-minute transcript, but it gave me clips that were longer than 3 minutes.
  • The content of the clips did not match the timestamps.

=> prompt:
Act as an expert in podcast transcript analysis. You are tasked with creating a comprehensive and engaging video based on a transcript podcast with a duration between 30 seconds and 220 seconds long.
The transcript follows a specific format, with conversation numbers, timestamps, speaker names, and their words.
Your main task is to extract key points from the transcript, covering one subject each with a duration between 30 seconds and 220 seconds long. Here are the steps to follow:

Please follow these steps:

  1. Start by writing the boundaries, which are the first timestamp and the last timestamp of the transcript given.
  2. Create a list of nano key points based on the topics or themes discussed in the podcast. Each nano key point should be clip of duration between 30 seconds and 220 seconds long, covering one subject and should be in the format [“{start timestamp}”, “{end timestamp}”, “{title}”]. Make sure to respect the boundaries.
  3. Ensure that each nano key point starts with the beginning of the subject or the topics or themes, not in the middle, to provide a clear context for the audience.
  4. Each nano key point should have a full context, not just a partial one, to make it easy for the audience to understand and follow along.

=> response:
Here are some nano key points extracted from the provided podcast transcript:
[“00:00:00,000”, “00:00:05,989”, “Introduction”]
The podcast starts with X introducing the show.
[“00:00:06,029”, “00:00:07,196”, “CGI in AI”]
X mentions that some people are working on CGI in AI.
[“00:00:08,200”, “00:00:09,494”, “Discussion on AI”]
Y joins the conversation, and they discuss AI-related topics.
[“00:00:10,361”, “00:00:11,071”, “Social Media Presence”]
Y talks about his current social media presence and internet activity.

From what you have posted I would not be able to complete the task either, “The transcript follows a specific format, with conversation numbers, timestamps, speaker names, and their words.” is not a specific format, a specific format would be
[Time-Date][Speaker Name][Transcript content for that speaker]

That is just an example, also LLM’s are not good with time, it has not idea how many words or how long words take to speak, it has no voice, it has no experience of what time means, you might be able to specify numeric values as a guide, but you need to think about how to make that aspect work, it may require traditional software methods to be introduced, your task is non trivial and contains elements a language model will struggle with.

2 Likes