MediaRecorder API w/ Whisper not working on mobile browsers

I’m using the MediaRecorder API to record voice using the browser and it works well on my laptop, however, on my phone I don’t get the correct transcription.

Initially, on my iPhone recording and ending recording wasn’t doing anything, so I tried changing the audio format from audio/webm to audio/mpeg. This worked to make my app return the conversation between myself and ai, but the results are still wrong.

Sometimes it says I said “MBC 뉴스 이덕영입니다” (which oddly translates to “This is Lee Deok-young from MBC News.”) or something random like “Bye!”

I’m using next js, here’s my code for recording voice:

const startRecording = async () => {
    // Request access to the user's microphone
    const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
    // Create a new MediaRecorder instance with the audio stream
    mediaRecorder.current = new MediaRecorder(stream);
    const chunks: Blob[] = [];

    // Event handler to collect audio data as it becomes available
    mediaRecorder.current.ondataavailable = (e) => chunks.push(e.data);

    // Event handler for when recording stops
    mediaRecorder.current.onstop = async () => {
      // Combine all audio chunks into a single blob
      const audioBlob = new Blob(chunks, { type: "audio/wav" });
      // Create FormData to send the audio file to the server
      const formData = new FormData();
      formData.append("audio", audioBlob, "recording.wav");

      // Send the audio file to the server for processing
      console.log("Sending audio blob:", audioBlob);
      const response = await fetch("/api/process-audio", {
        method: "POST",
        body: formData,
      });
      const data = await response.json();

and here’s my code for processing the audio:

export async function POST(request: Request) {
  // Extract the audio file from the incoming request
  const formData = await request.formData();
  const audioFile = formData.get("audio") as File;

  // Step 1: Transcribe the audio using OpenAI's Whisper model
  console.log("Audio file received:", audioFile)
  const transcription = await openai.audio.transcriptions.create({
    file: audioFile,
    model: "whisper-1",
  });
  console.log("Transcription result:", transcription.text);

Is this the best way for me to get audio recorded using the microphone of a user’s phone? It would be amazing to get it working as close as the chatGPT app, or to at least get the transcript right using this api. How can I resolve this issue?

Thanks!

1 Like

LOL I was getting the exact same “MBC” gibberish when I was trying to set up an interface (on my laptop) to integrate a custom GPT with a custom ElevenLabs voice. It actually ended up being a trigger key issue - if I accidentally hit the trigger key to record while in the midst of typing something, then it would do the weird Korean thing. I also had the same issue earlier when it would just transcribe “you” and nothing else, no matter what I said. That one actually ended up being a microphone issue, so once I fixed the mic settings, it was just fine.

I’m using OpenAI’s Whisper API for voice-to-text transcription too, but I haven’t tried anything yet on my phone, so sorry if that’s not helpful.

Thanks for sharing your experience. Yeah sadly that’s not helpful as I’m not hitting any triggers and my microphone has no issues. I hope someone has some insights. Basically the Whisper API is not transcribing my audio file even though the format is supported and I’ve tried multiple formats.

Update: my issue was resolved by adding a one minute wait using mediaRecorder.current.start(1000);

3 Likes

I had exactly the same error. Thanks for sharing your solution. However, I am developing a real-time voice translation code that should not have any delay, so I don’t think it will be easy to use this method.

We get the exactly same weird translation “MBC 뉴스 이덕영입니다.” when we are recording silence or only background noises. What is this default transcript?

1 Like

In my case (not Whisper but Amazon Transcribe), MediaRecorder on iOS ondataavailable(event) is called with empty event.data, and after this the transcription ceases to work. Filtering these empty chunks fixed the issue. Hope this helps.