Whisper API cannot read files correctly

Even with the previous solutions mentioned in this thread, none of them worked for my environment in nextjs using the App Router. But now finally after some trial and error I figured it out and thought I’d post my solution here as well for others who still struggle.

Using the vanilla safari MediaRecorder api worked to record audio/mp4 blobs, but sending them to the whisper API always gave me transcripts like Hello or Thank You or Bye., no matter what the content of the recording was. That’s even after @michellep posted that the backend was updated.

Using mediaRecorder.start(1000) didn’t work for me, it would just upload the recorded blob after each second of recording. This aligns with the Mozilla docs. It’s actually a mystery to me how other people made it work with that setting.

I also tried recordRTC.js and got that working eventually using audio/wav but wasn’t satisfied with this solution as the blobs are way larger than with audio/webm or audio/mpeg.

Soltuion that worked for me

Other solutions above mentioned audio-recorder-polyfil, which is hard to use in a nextjs app router environment due to the server side rendering by default. Even use client wouldn’t do the trick as it usually does. But now finally I found what I had to do to make it work:

In the parent component that needs the recording button, I’m importing the recording button component like so:

# MyComponent.tsx
import dynamic from "next/dynamic"
import React from "react"

export default function MyComponent() {
  const RecordingButton = dynamic(() => import("./RecordingButton"), { ssr: false })
  return (
    <div>
        // other stuff
        <RecordingButton />
    </div>
  )
}

And then inside the RecordingButton component I’m only importing the polyfil if audio/webm isn’t supported by the browser:

# RecordingButton.tsx

const supportsWebm = typeof MediaRecorder !== "undefined" && MediaRecorder.isTypeSupported("audio/webm")

if (!supportsWebm) {
  // Dynamically import the polyfill if 'audio/webm' is not supported
  Promise.all([import("audio-recorder-polyfill"), import("audio-recorder-polyfill/mpeg-encoder")])
    .then(([AudioRecorderModule, mpegEncoderModule]) => {
      const AudioRecorder = AudioRecorderModule.default
      const mpegEncoder = mpegEncoderModule.default

      AudioRecorder.encoder = mpegEncoder
      AudioRecorder.prototype.mimeType = "audio/mpeg"
      window.MediaRecorder = AudioRecorder
    })
    .catch((error) => {
      console.error("Error importing polyfill:", error)
    })
}

After that I was able to just use the regular browser MediaStream Recording API (you can just ask ChatGPT how to use that from here on).

I like this solution best, because I still get to use compressed formats and don’t have to use .wav and also I can just use the regular MediaStream API.

Ps.: Unfortunately had to exclude all links to docs and libraries. Would be nice if links would be enabled to make higher quality posts.