Workflow for generating full-text transcript and captions with Whisper

I’m currently testing the waters with using whisper to generate full-text transcripts AND captions for videos. My workflow looks like this:

  • Use whisper locally to generate SRT captions (timestamped) and TXT transcript (no timestamp).
  • Manually edit necessary corrections with a colleague (Google Docs) in both the captions and transcript.
  • Publish captioned video (not forced / burned-in subs) + separate transcript document

The problem here is that I need to update both the SRT and TXT versions concurrently. But this workflow means duplicating this work in both files/documents.

Does whisper provide a workaround/feature like alignment for this? E.g.:

  1. Generate TXT transcript,
  2. Make corrections,
  3. Generate SRT with corrections and timestamps according to original file?

Prior to whisper I’ve used either or Premiere Pro to generate transcripts, made corrections in a Google Doc (necessary in my use case for version control, use of Grammarly, sharing for review, etc.), and then used the alignment features in Google Drive or YouTube to turn the full-text transcript into captions.

It seems like a Google Apps script + the whisper API could be used to handle all of this, but curious what other solutions people have for this or overlooked features there are in whisper.

1 Like

If I understood you correctly, this can be successfully resolved on the side of Google Apps. You just need to replace the new text in App Script. With a script like this.

function updateSrtFile() {
  var doc = DocumentApp.getActiveDocument();
  var body = doc.getBody();
  var text = body.getText();
  var srtFile = DriveApp.getFileById('id вашего SRT-файла');
  var srtContent = srtFile.getBlob().getDataAsString();
  var lines = srtContent.split('\n');
  for (var i = 0; i < lines.length; i++) {
    if (lines[i].indexOf('текст для замены') !== -1) {
      lines[i] = lines[i].replace('текст для замены', text);
  srtContent = lines.join('\n');