MediaRecorder API w/ Whisper not working on mobile browsers

arshia · July 14, 2024, 4:43pm

I’m using the MediaRecorder API to record voice using the browser and it works well on my laptop, however, on my phone I don’t get the correct transcription.

Initially, on my iPhone recording and ending recording wasn’t doing anything, so I tried changing the audio format from audio/webm to audio/mpeg. This worked to make my app return the conversation between myself and ai, but the results are still wrong.

Sometimes it says I said “MBC 뉴스 이덕영입니다” (which oddly translates to “This is Lee Deok-young from MBC News.”) or something random like “Bye!”

I’m using next js, here’s my code for recording voice:

const startRecording = async () => {
    // Request access to the user's microphone
    const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
    // Create a new MediaRecorder instance with the audio stream
    mediaRecorder.current = new MediaRecorder(stream);
    const chunks: Blob[] = [];

    // Event handler to collect audio data as it becomes available
    mediaRecorder.current.ondataavailable = (e) => chunks.push(e.data);

    // Event handler for when recording stops
    mediaRecorder.current.onstop = async () => {
      // Combine all audio chunks into a single blob
      const audioBlob = new Blob(chunks, { type: "audio/wav" });
      // Create FormData to send the audio file to the server
      const formData = new FormData();
      formData.append("audio", audioBlob, "recording.wav");

      // Send the audio file to the server for processing
      console.log("Sending audio blob:", audioBlob);
      const response = await fetch("/api/process-audio", {
        method: "POST",
        body: formData,
      });
      const data = await response.json();

and here’s my code for processing the audio:

export async function POST(request: Request) {
  // Extract the audio file from the incoming request
  const formData = await request.formData();
  const audioFile = formData.get("audio") as File;

  // Step 1: Transcribe the audio using OpenAI's Whisper model
  console.log("Audio file received:", audioFile)
  const transcription = await openai.audio.transcriptions.create({
    file: audioFile,
    model: "whisper-1",
  });
  console.log("Transcription result:", transcription.text);

Is this the best way for me to get audio recorded using the microphone of a user’s phone? It would be amazing to get it working as close as the chatGPT app, or to at least get the transcript right using this api. How can I resolve this issue?

Thanks!

cepaul518 · July 14, 2024, 5:20pm

LOL I was getting the exact same “MBC” gibberish when I was trying to set up an interface (on my laptop) to integrate a custom GPT with a custom ElevenLabs voice. It actually ended up being a trigger key issue - if I accidentally hit the trigger key to record while in the midst of typing something, then it would do the weird Korean thing. I also had the same issue earlier when it would just transcribe “you” and nothing else, no matter what I said. That one actually ended up being a microphone issue, so once I fixed the mic settings, it was just fine.

I’m using OpenAI’s Whisper API for voice-to-text transcription too, but I haven’t tried anything yet on my phone, so sorry if that’s not helpful.

arshia · July 15, 2024, 9:46pm

Thanks for sharing your experience. Yeah sadly that’s not helpful as I’m not hitting any triggers and my microphone has no issues. I hope someone has some insights. Basically the Whisper API is not transcribing my audio file even though the format is supported and I’ve tried multiple formats.

arshia · July 15, 2024, 10:16pm

Update: my issue was resolved by adding a one minute wait using mediaRecorder.current.start(1000);

davidjo0326 · August 22, 2024, 12:29pm

I had exactly the same error. Thanks for sharing your solution. However, I am developing a real-time voice translation code that should not have any delay, so I don’t think it will be easy to use this method.

nikolaus.sabathiel · October 23, 2024, 2:10pm

We get the exactly same weird translation “MBC 뉴스 이덕영입니다.” when we are recording silence or only background noises. What is this default transcript?

sot.lampr · November 7, 2024, 1:10pm

In my case (not Whisper but Amazon Transcribe), MediaRecorder on iOS ondataavailable(event) is called with empty event.data, and after this the transcription ceases to work. Filtering these empty chunks fixed the issue. Hope this helps.

rgtimothy · December 20, 2024, 10:22am

hi there, could you please elaborate more on this? Thanks!

Topic		Replies	Views
Whisper API not transcribing audio files coming from an iphone API ios , whisper , javascript	10	2538	December 18, 2024
Whisper API only transcribing first few seconds API whisper	7	3354	December 19, 2023
Whisper: problem with audio/mp4 blobs from Safari Bugs	12	4886	November 28, 2024
Whisper spitting out gibberish when trying to transcribe API whisper	4	1162	June 14, 2024
IOS recordings in Whisper only gather the first few seconds API api , ios , whisper	4	1439	December 18, 2023

MediaRecorder API w/ Whisper not working on mobile browsers

Related topics