Send an hours worth of audio through Whisper using node.js

curt.kennedy · October 25, 2023, 2:37am

Either chop it into 25M chunks, and risk cutting it on a word, or use pydub, or similar libraries, and break it on a silent 25M boundary. (ref)

curt.kennedy · October 25, 2023, 2:47am

It’s a quasi-assumption that all AI based forums use Python.

Gosh, I really don’t know what open source thing is out there. I would just risk it personally then. The really real question you have to ask yourself, do you really care if 1 word out of 25 Megs of data is off? I wouldn’t since the model WER is higher than you think and messing up more than you think.

So just chop the files up in the most appropriate way in the language you want, IMO.

supershaneski · October 25, 2023, 2:48am

~~This is using ffmpeg-pac, have not tried it but I wrote it down in my notes. I probably found it somewhere here before:~~

Edit:
I am trying to find the module ffmpeg-pac but I cannot find it. I think this code is probably generated by ChatGPT. So, I removed it. Anyway, the command line way to split using ffmpeg is:

ffmpeg -i "input_audio_file.mp3" -f segment -segment_time 3600 -c copy output_audio_file_%03d.mp3

supershaneski · October 25, 2023, 5:10am

I tried to find other solution without using any external module/library. Here is something I tried and tested. I am assuming you have ffmpeg installed in the backend.

import fs from 'fs'
import path from 'path'
import { exec } from 'child_process'

const sourceAudio = path.join('public', 'audio', 'Amore.mp3')
const outputAudio = path.join('public', 'audio', 'Amore-segment_%03d.mp3')

const ret = await new Promise((resolve, reject) => {
        // 120 second segments
        const sCommand = `ffmpeg -i "${sourceAudio}" -f segment -segment_time 120 -c copy ${outputAudio}`

        exec(sCommand, (error, stdout, stderr) => {
            
            if (error) {
                
                resolve({
                    status: 'error',
                })

            } else {

                resolve({
                    status: 'success',
                    error: stderr,
                    out: stdout,
                })

            }
            
        })

})

Result

Original:
Amore.mp3, 3:07, 4.4MB

Output:
Amore-segment_000.mp3, 2:00, 2.8MB
Amore-segment_001.mp3, 1:07, 1.6MB

_j · October 25, 2023, 9:57am

For reliable use of ffmpeg, there can be other streams in the input file, like m4a that has mjpeg icons (video), and other metadata that corrupts and wastes space, that must be not passed to the output.

Another thing you can do is recompress with ffmpeg.

I take a 64k stereo mp3 and mash it with OPUS in an OGG container down to 12kbps mono, also using the speech optimizations. Command line is below:

ffmpeg -i audio.mp3 -vn -map_metadata -1 -ac 1 -c:a libopus -b:a 12k -application voip audio.ogg

Opus is the highest quality at low bitrates, and is supported by whisper in ogg container.

(Conversion log)

Input #0, mp3, from 'audio.mp3':
  Duration: 00:00:27.74, start: 0.000000, bitrate: 64 kb/s
  Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 64 kb/s
File 'audio12.ogg' already exists. Overwrite? [y/N] y
Stream mapping:
  Stream #0:0 -> #0:0 (mp3 (mp3float) -> opus (libopus))
Press [q] to stop, [?] for help
Output #0, ogg, to 'audio12.ogg':
  Metadata:
    encoder         : Lavf59.17.100
  Stream #0:0: Audio: opus, 48000 Hz, mono, flt, 12 kb/s
    Metadata:
      encoder         : Lavc59.20.100 libopus
size=      43kB time=00:00:27.75 bitrate=  12.6kbits/s speed=48.9x

Comparing two transcriptions, re-encoded version (top) is actually more accurate to the start of the audio:

{
“text”: “that this is a radio show where people call us and ask us questions about cars, right? And what were we just talking about before the mics came on? We were both talking about what’s wrong with our respective vehicles. This has happened in the mind that charges their systems aren’t working. It’s pretty sad. Well, my real question is, who do we call? Who do we call? I call you when I have a problem.”
}
{
“text”: “This is a radio show where people call us and ask us questions about cars, right? And what were we just talking about before the mics came on? We were both talking about what’s wrong with our respective vehicles. This has happened in the mind that charges their systems aren’t working. It’s pretty sad. Well, the real question is, who do we call? Who do we call? I call you when I have a problem.”
}

Encoding 3.5 hours of Howard Stern AAC to Opus (which would be a $1.25 transcript). 86MB to 19MB (and the stripping of the multimedia above was required to make it play in foobar2000 and leaves more audio bits) (PS, don’t do this, you’ll likely get an API timeout)

healer · December 8, 2023, 4:14pm

Thanks.
But I have a qustion.
Can I use file size instead of time in this command?
Openai whisper limit the file size as 25MB, so I need to split the large audio file into chunk. In this way, if I can use file size value instead of time value, it would be great.
Please help me.

supershaneski · December 10, 2023, 11:17pm

Due to the nature of the data (audio) it is more logical to split your file by time and there is no direct way using ffmpeg to do it. But by experience, the chunks output using the time approach has similar file size. So I would suggest that perhaps you can approximate how long can be fitted within 25MB of your data and just use it.

healer · December 11, 2023, 9:39am

Thanks very much for your help.
I will ping you later if you have any other question.
Thanks again.

Topic		Replies	Views
Whisper API, increase file limit >25 MB API whisper , feature-request	29	14605	June 19, 2024
Whisper API, How to upload file that larger than 25mb API api , feature-request	4	5679	March 22, 2024
Whisper API server error for long (not big) files API whisper	7	3541	December 18, 2023
How to transcribe long audio to srt file directly? API whisper	3	4499	December 16, 2023
Can't find a way to transcribe files bigger than 25 MB API api , whisper	4	1795	March 15, 2024

Send an hours worth of audio through Whisper using node.js

Related topics