I’m currently testing the waters with using whisper to generate full-text transcripts AND captions for videos. My workflow looks like this:
- Use whisper locally to generate SRT captions (timestamped) and TXT transcript (no timestamp).
- Manually edit necessary corrections with a colleague (Google Docs) in both the captions and transcript.
- Publish captioned video (not forced / burned-in subs) + separate transcript document
The problem here is that I need to update both the SRT and TXT versions concurrently. But this workflow means duplicating this work in both files/documents.
Does whisper provide a workaround/feature like alignment for this? E.g.:
- Generate TXT transcript,
- Make corrections,
- Generate SRT with corrections and timestamps according to original file?
Prior to whisper I’ve used either Otter.ai or Premiere Pro to generate transcripts, made corrections in a Google Doc (necessary in my use case for version control, use of Grammarly, sharing for review, etc.), and then used the alignment features in Google Drive or YouTube to turn the full-text transcript into captions.
It seems like a Google Apps script + the whisper API could be used to handle all of this, but curious what other solutions people have for this or overlooked features there are in whisper.