Issues with audio files from IOS and the x-m4a format

I have a node server that accepts audio files from a web app ( built in React ) and a mobile app ( built in React Native ). The audio file is a blob format. The node server transcribes the audio with Whisper.

Blobs that come in from the web work great and are transcribed as expected. But the audio files that come from the IOS return the error:

Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']

The only difference I can tell between the blob from the web app and the blob from the IOS app is that the web app blob’s mimetype is audio/m4a and the IOS app blob’s mimetype is audio/x-m4a.

In the node app, I convert the blob to a buffer then to a file and send that to Whisper. Here’s that code:

const audioAsString = audioBlob.buffer.toString('base64');
const audioBuffer = Buffer.from(audioAsString, "base64");

const file = await toFile(audioBuffer, "audio.wav", {contentType: "audio/wav"});

const payload = {
    model: "whisper-1",
    file: file
  }; => {
  // More logic 

I’ve tried converting the IOS app’s blob to different formats, but I still get the same error from Whisper. Any help figuring out how to use a blob from the IOS app would be appriciated.

Without some sample files there isn’t any way for me to know for sure (not an Apple user), but if I had to make my overly biased guess I would guess it’s some form of Apple knowing better than everyone else, doing something non-standard and not caring if it breaks anything outside the Apple ecosystem.

You can try transcoding the iOS audio files to some other acceptable format using ffmpeg with something like,

ffmpeg -i input_file.m4a -codec:a libmp3lame -qscale:a 2 output_file.mp3

If whisper accepts it after transcoding your know it’s some weird Apple thing and you can either try to dig into it further or just deal with the transcoding.

1 Like

Going through the same thing at the moment. I’m pretty sure this is a bug on the part of Apple because I saved a .m4a to my server and piped that into transcriptions.create directly with no problems.

1 Like

Hey @mail44 I was actually able to fix this by changing how the file was encoded within my mobile app. I might be able to help you out if this route would work for you too. Feel free to give me a ping.

The solution for me ended up being to change the encoding of the audio file within my mobile app. I was able to encode the file in the ‘wav’ format.

Hey! I don’t think this forum supports DMs? At least I don’t see it. Can you email me at [my email]? I would greatly appreciate it!

Hey, hit me up on linkedin and I’ll see if I can help out. johngoodman09