Whisper API cannot read files correctly

cdmsdashboard · May 15, 2023, 11:13am

Hi Daniel, I am facing the same issue. It’s working on simulator but not on iPhone where audio format is m4a. Please share if you fixed it.

ben.ci · May 20, 2023, 1:29am

Hi, I am also facing the same issue by using record library for flutter in iOS.

Here is my code for recording

      await record.start(
        path: path,
        encoder: AudioEncoder.aacLc, // by default
        bitRate: 128000, // by default
        samplingRate: 44100, // by default
      );

After that, I upload the file to Google drive to trigger a pipeline which invoke WhisperApi

final DriveApi driveApi = await _getDriveApi();

    File fileToUpload = File();
    fileToUpload.parents = [_googleDriveFolderId];
    fileToUpload.name = '${file.getFileName()}.m4a';
    fileToUpload.mimeType = 'audio/mp4';

    try {
      await driveApi.files.create(
        fileToUpload,
        uploadMedia: Media(file.openRead(), file.lengthSync()),
      );
    } catch (_) {
      print('Fail to upload to google drive');
    }

Then I get the same invalid file format issue. Can anyone please give some advice? Thank you

neet.kes · May 30, 2023, 1:13pm

Dear All,
To overcome the iOS audio issue, I utilized the “mic-recorder-to-mp3” npm package. This package not only allows for seamless audio recording but also ensures compatibility across various platforms, including iOS.

As this record your microphone audio input and get an audio/mp3 file in the end so there will be no issue related to any file format.

Once you get the blog url, convert that in base64 by using below code:

useEffect(() => {
		if (blobURL != null) {
			fetch(blobURL)
				.then(response => response.arrayBuffer())
				.then(arrayBuffer => {
					const base64Data = btoa(new Uint8Array(arrayBuffer).reduce((data, byte) => data + String.fromCharCode(byte), ''));
					const rawData = `base64,${base64Data}`;
					fetchData(rawData);
				})
				.catch(error => console.error(error));
		}
	}, [blobURL]);

And then use your transcribe API to return then text from using above base64.

And there will no invalid file format issue in iOS

mitchhh · June 16, 2023, 12:49pm

Thanks, the polyfill suggestion fixed my issue with using MediaRecorder audio in Vite.

keizo · August 23, 2023, 12:13am

Hi all, stumbled upon this in my googling and believe I was having the same Safari problem. I would get one word or empty responses from the api.

I can’t explain it, but when I pass in a time slice parameter to mediaRecorder.start(), then it Safari seems to produce files that work. This is with mimeType: ‘audio/mp4;’

mediaRecorder.start(1000);

I just figured this out, have not tested extensively. Definitely would be curious if anyone can explain lol.

xacto · August 28, 2023, 2:01am

I just wanted to point out that this is still an issue. I see it is marked with a solution but it is still a problem.

gabriel6 · September 11, 2023, 10:46am

I am using this code, and it only fails to work on Safari on iPhones. I always get Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']

        const audioBlob = new Blob(this.recordedChunks, { type: "audio/mp3" });
        const formData = new FormData();
        const file = new File([audioBlob], "audio.mp3", { type: "audio/mp3" });
        formData.append("file", file, "audio.mp3");
        formData.append("model", "whisper-1");
        axios
          .post("https://api.openai.com/v1/audio/transcriptions", formData, {
            headers: {
              Authorization: `Bearer ${import.meta.env.VITE_OPENAI_API_KEY}`,
              "Content-Type": "multipart/form-data"
            }
          })
          .then((response) => {
            const userResponse = response?.data?.text;

It also doesn’t work if I change these lines:

   const file = new File([audioBlob], "audio.mp3", { type: "audio/mp3" });
   formData.append("file", file, "audio.mp3");

to

   formData.append("file", audioBlob, "audio.mp3");

keizo · September 14, 2023, 3:08am

check my post above. curious if it works for you too.

jareddr · September 17, 2023, 3:21am

Hey, super old post, but I’m getting desperate while struggling with this exact same problem right now.

I can’t figure out how to get the [audio-recorder-polyfill] code included in my sveltekit vite project.

I’ve tried the ‘quick hack’ suggesting of including

import AudioRecorder from 'https://cdn.jsdelivr.net/npm/audio-recorder-polyfill/index.js'
window.MediaRecorder = AudioRecorder

I’ve also tried downloading that file and putting it and the associated code in my /lib folder, but no go.

I’m beginning to think it’s something very specific to my Sveltekit preprocessor, but I don’t have enough expertise to figure out what’s causing the issue. The error message is extremely unhelpful.

My code in my front end looks like

onMount(() =>{
		import MediaRecorder from '../../lib/audio-recorder-polyfill//index.js'
		window.MediaRecorder = MediaRecorder
	})

And the error is popping up as

[plugin:vite-plugin-svelte] Error while preprocessing /home/jared/nomnomnom/nomnomnom/src/routes/home/+page.svelte - Transform failed with 1 error:
/home/jared/nomnomnom/nomnomnom/src/routes/home/+page.svelte:12:9: ERROR: Unexpected "MediaRecorder"
/home/jared/nomnomnom/nomnomnom/src/routes/home/+page.svelte
Unexpected "MediaRecorder"
 10 |  
 11 |  	onMount(() =>{
 12 |  		import MediaRecorder from '../../lib/audio-recorder-polyfill//index.js'
    |           ^
 13 |  		window.MediaRecorder = MediaRecorder
 14 |  	})
Click outside

I have a project that was working perfectly in desktop browser and it’s now falling apart on mobile because of the mp4 → whisper problem.

jareddr · September 17, 2023, 5:26am

Replying to my own post in case this helps anyone. After hours of poking at this I finally got it. In SvelteKit you either need to disable SSR for your page, or you can load this module in an onMount block

	onMount(async () =>{
		const {default:AudioRecorder}  = await import('audio-recorder-polyfill')
		 window.MediaRecorder = AudioRecorder
	})

I’m finally able to submit microphone recordings for transcription from IOS!

jays5 · September 22, 2023, 10:21pm

mediaRecorder.start(1000);

This fixed it for me, after weeks of trying other fixes! Thanks so much.

jonnylangefeld · October 7, 2023, 5:08pm

Even with the previous solutions mentioned in this thread, none of them worked for my environment in nextjs using the App Router. But now finally after some trial and error I figured it out and thought I’d post my solution here as well for others who still struggle.

Using the vanilla safari MediaRecorder api worked to record audio/mp4 blobs, but sending them to the whisper API always gave me transcripts like Hello or Thank You or Bye., no matter what the content of the recording was. That’s even after @michellep posted that the backend was updated.

Using mediaRecorder.start(1000) didn’t work for me, it would just upload the recorded blob after each second of recording. This aligns with the Mozilla docs. It’s actually a mystery to me how other people made it work with that setting.

I also tried recordRTC.js and got that working eventually using audio/wav but wasn’t satisfied with this solution as the blobs are way larger than with audio/webm or audio/mpeg.

Soltuion that worked for me

Other solutions above mentioned audio-recorder-polyfil, which is hard to use in a nextjs app router environment due to the server side rendering by default. Even use client wouldn’t do the trick as it usually does. But now finally I found what I had to do to make it work:

In the parent component that needs the recording button, I’m importing the recording button component like so:

# MyComponent.tsx
import dynamic from "next/dynamic"
import React from "react"

export default function MyComponent() {
  const RecordingButton = dynamic(() => import("./RecordingButton"), { ssr: false })
  return (
    <div>
        // other stuff
        <RecordingButton />
    </div>
  )
}

And then inside the RecordingButton component I’m only importing the polyfil if audio/webm isn’t supported by the browser:

# RecordingButton.tsx

const supportsWebm = typeof MediaRecorder !== "undefined" && MediaRecorder.isTypeSupported("audio/webm")

if (!supportsWebm) {
  // Dynamically import the polyfill if 'audio/webm' is not supported
  Promise.all([import("audio-recorder-polyfill"), import("audio-recorder-polyfill/mpeg-encoder")])
    .then(([AudioRecorderModule, mpegEncoderModule]) => {
      const AudioRecorder = AudioRecorderModule.default
      const mpegEncoder = mpegEncoderModule.default

      AudioRecorder.encoder = mpegEncoder
      AudioRecorder.prototype.mimeType = "audio/mpeg"
      window.MediaRecorder = AudioRecorder
    })
    .catch((error) => {
      console.error("Error importing polyfill:", error)
    })
}

After that I was able to just use the regular browser MediaStream Recording API (you can just ask ChatGPT how to use that from here on).

I like this solution best, because I still get to use compressed formats and don’t have to use .wav and also I can just use the regular MediaStream API.

Ps.: Unfortunately had to exclude all links to docs and libraries. Would be nice if links would be enabled to make higher quality posts.

johan.lajili · November 15, 2023, 1:32pm

@jonnylangefeld 's solution initially worked for me, thanks for that.

However, it has a bug when in a progressive web app (PWA) context on IOS Safari. It initially works, but when putting the app in the background and back in the foreground it no longer works (despite reinitialising anything that could potentially be reinitialised). The recording blob is empty. I would not be surprised if there were similar issues in a webview as well thought that’s just a guess.

However, the solution detailed by @neet.kes , aka converting it to an mp3 on the device directly, using mic-recorder-to-mp3 for instance works.

To be clear I think that’s an issue with the polyfill potentially, not with your specfic implementation.

far0s · November 27, 2023, 2:43pm

Hi, I’m adding myself to the list of people suffering from this problem (as of November 27th, 2023).
Like xacto pointed out, this is still an issue even though the thread is marked as resolved. cc @michellep

I’ve gone through this thread and others and observed the same behaviour described by 0x41mmar (the .mp3 file encoded on safari -iOS and macOS- always returns an error Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg'] even though the audio file runs fine on my machine. Also tried with .webm, same issue).

My setup:

a Next.js app (v14) built from the vercel/ai/next-openai boilerplate from Vercel.
using MediaRecorder client-side to record audio
and openai.audio.transcriptions.create() to send the audio to OpenAI server-side, edge runtime.

I don’t want to invest in a solution using ffmpeg as I’m just making a prototype at the moment.

I tried the solution proposed by @jonnylangefeld, but somehow the audio-recorder-polyfill just returns a silent file for me (the length is correct, but input sound is not encoded). I still need to try a couple things that have been mentioned here.

Overall I find this very frustrating because everyone in this thread seems to be using various workarounds to the same problem, which sounds like it could be resolved in a more straightforward way on OpenAI’s end.

tomboolean · December 1, 2023, 3:27pm

I am also facing this issue today on Dec 1, 2023. I’m using React (once this is working I’m ready to deploy it I’ll use a separate js backend as recommended for security purposes). Here’s my code.

Setting up the stream:

            navigator.mediaDevices.getUserMedia({ video: true, audio: true })
                .then((currentStream) => {
                    setStream(currentStream);
                    // show the webcam/microphone info in the video for the current user
                    if (myVideo.current) {
                        myVideo.current.srcObject = currentStream;
                    }

                    // use MediaStream Recording API
                    const mediaRecorder = new MediaRecorder(currentStream)
                    // make data available event fire every one second
                    setMediaRecorder(mediaRecorder);
                    mediaRecorder.start(20000);
                    mediaRecorder.ondataavailable = handleDataAvailable;
                    mediaRecorder.onstop = handleStop;
                })
                .catch((err) => {
                    alert(`The following error occurred: ${err}`);
                });

Handle data available:

        const handleDataAvailable = (event: BlobEvent) => {
            console.log("#handleDataAvailable");
            // get the Blob from the event
            const eventBlob = event.data
            if (eventBlob && eventBlob.size > 0) {
                const totalSpeechToTextChunksRecorded = speechToTextChunks.concat(eventBlob);

                setSpeechToTextChunks(totalSpeechToTextChunksRecorded);
                // Send to Whisper API
                const recordingFileName = `${recordingFilePrefix}-${totalSpeechToTextChunksRecorded.length}.mp4`;
                console.log('=========================')
                console.log(`recordingFileName: ${recordingFileName}`)
                //var audioFile = new File([eventBlob], recordingFileName, {type: "video/mp4"});
                api.transform.speechToText(eventBlob, recordingFileName)
                    .then(transcription => {
                        console.log(transcription.text)
                    })

            }
        }

I always get Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']. Even though I set the mime type and append the .mp4 file extension on the filename as suggested.

I guess I’m forced to use a Python backend which will introduce more latency for a streamed transcription.

This is tough guys - please fix or help.

oq1 · December 2, 2023, 9:00am

+1 on the above. This is 100% still an issue, doesn’t work for me on Nextjs for Safari.

simonneve · December 3, 2023, 2:01am

+1. My tried and tested solution of ‘do nothing and wait till fixed’ isn’t working out so far.

jw82 · December 25, 2023, 5:28pm

I am having the same issue, however, I am recording audio from an Android device and trying to have is transcribed. I get the same error message everyone else here is seeing.

EricGT · December 25, 2023, 5:34pm

This topic has diverged so much over such a long period of time that to even understand what is the problem(s) can take several minutes of reading and even then the specific details for each user may not be the same.

Closing this topic so that new topics can be created.

If you have this problem still please open a new topic and give specific details.

Topic		Replies	Views
Whisper API only transcribing first few seconds API whisper	7	2968	December 19, 2023
[SOLVED] Whisper translates into Welsh API whisper	107	16292	November 25, 2023
Whisper api completely wrong for mp4 API whisper	14	4610	December 15, 2023
Whisper API not transcribing audio files coming from an iphone API ios , whisper , javascript	8	1619	October 28, 2024
Whisper: problem with audio/mp4 blobs from Safari Bugs	11	4003	August 27, 2024

Whisper API cannot read files correctly

Soltuion that worked for me

Related topics