Whisper API rejecting MP4 from safari - but works with webm on chrome and edge

Hello! I’m using React.js and node.js to complete a POC for a client. Currently, I need to be able to capture audio from the browser and feed that into a backend server to complete quite a bit of processing.

Without going into too much detail, the problem I am currently facing is when I try to use the record button (in my react app which starts the browser audio API) in safari and send that over to the BE I get a 400 saying the file is not in the expected format and then… an array of formats that are accepted is presented. However, when I console.log the format of the file, it shows audio/mp4 which IS one of the accepted file types shown in the array.

I’ve tried implementing some of the suggested use cases in previous threads addressing this issue like mediaRecorder.start(1000) for safari browsers but continue to receive the same error.

Am i going to have to create a special case on my BE just for safari browsers using a conversion library or is there a simple workaround I can implement on client side to keep the ball rolling without having to add a bunch of stuff?

Start function for reference →

const startRecording = () => {
    console.log("Starting recording");
    if (!conversationStarted) {
        setConversationStarted(true);
        setElapsedTime(0); 
        setTimer(null);
    }
    if (mediaRecorder && mediaRecorder.state === "inactive") {
        mediaRecorder.start(); // <--- is this the culprit? Do i need to check if the browser is safari and change it to 1000?
        setIsRecording(true);
        const startTime = new Date().getTime();
        setStartTime(startTime);
        setAudioURL('');

        const intervalId = setInterval(() => {
            const now = new Date().getTime();
            const elapsed = Math.floor((now - startTime) / 1000);
            setElapsedTime(elapsed);
        }, 1000);
        setIntervalId(intervalId);
    }
};

If it is possible in your case, try to process the received audio data in the BE using ffmpeg command. For example, here is a command to remove silent parts in your audio:

ffmpeg -i audio.mp3 -af silenceremove=stop_periods=-1:stop_duration=1:stop_threshold=-50dB audio_output.mp3

Use exec to execute it in the backend.

const ret = await new Promise((resolve, reject) => {

        const sCommand = `ffmpeg -i ${origAudioFile} -af silenceremove=stop_periods=-1:stop_duration=1:stop_threshold=-50dB ${outAudioFile}`

        exec(sCommand, (error, stdout, stderr) => {
            
            if (error) {
                
                resolve({
                    status: 'error',
                    error: error,
                })

            } else {

                resolve({
                    status: 'success',
                    error: stderr,
                    out: stdout,
                })

            }
            
        })

})

Yes, Thank you! That’s currently how I have it set up.

I was hoping to be able to do something on FE to avoid having to do that, I saw some solutions setting the chunking time to 1000 for safari browsers and I tried that but it still didn’t like it.

I guess for now I will be converting using ffmpeg for safari / mp4 until whisper finds away around that.

Thank you!