Issues with audio files from IOS and the x-m4a format

johngoodman · June 4, 2024, 4:25am

I have a node server that accepts audio files from a web app ( built in React ) and a mobile app ( built in React Native ). The audio file is a blob format. The node server transcribes the audio with Whisper.

Blobs that come in from the web work great and are transcribed as expected. But the audio files that come from the IOS return the error:

Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']

The only difference I can tell between the blob from the web app and the blob from the IOS app is that the web app blob’s mimetype is audio/m4a and the IOS app blob’s mimetype is audio/x-m4a.

In the node app, I convert the blob to a buffer then to a file and send that to Whisper. Here’s that code:

const audioAsString = audioBlob.buffer.toString('base64');
const audioBuffer = Buffer.from(audioAsString, "base64");

const file = await toFile(audioBuffer, "audio.wav", {contentType: "audio/wav"});

const payload = {
    model: "whisper-1",
    file: file
  };

openai.audio.transcriptions.create(payload).then((response) => {
  // More logic 
});

I’ve tried converting the IOS app’s blob to different formats, but I still get the same error from Whisper. Any help figuring out how to use a blob from the IOS app would be appriciated.

anon22939549 · June 4, 2024, 5:53am

Without some sample files there isn’t any way for me to know for sure (not an Apple user), but if I had to make my overly biased guess I would guess it’s some form of Apple knowing better than everyone else, doing something non-standard and not caring if it breaks anything outside the Apple ecosystem.

You can try transcoding the iOS audio files to some other acceptable format using ffmpeg with something like,

ffmpeg -i input_file.m4a -codec:a libmp3lame -qscale:a 2 output_file.mp3

If whisper accepts it after transcoding your know it’s some weird Apple thing and you can either try to dig into it further or just deal with the transcoding.

mail44 · June 17, 2024, 5:27pm

Going through the same thing at the moment. I’m pretty sure this is a bug on the part of Apple because I saved a .m4a to my server and piped that into transcriptions.create directly with no problems.

johngoodman · June 17, 2024, 6:56pm

Hey @mail44 I was actually able to fix this by changing how the file was encoded within my mobile app. I might be able to help you out if this route would work for you too. Feel free to give me a ping.

johngoodman · June 17, 2024, 6:58pm

The solution for me ended up being to change the encoding of the audio file within my mobile app. I was able to encode the file in the ‘wav’ format.

mail44 · June 17, 2024, 8:21pm

Hey! I don’t think this forum supports DMs? At least I don’t see it. Can you email me at [my email]? I would greatly appreciate it!

elektrikspark · June 28, 2024, 5:00pm

I don’t think I ran into issues with the x-m4a format? I did something like this:

    contents = await file.read()

    file_like = io.BytesIO(contents)

    file_data = (filename, file_like.read(), content_type)

    transcription = client.audio.transcriptions.create(
        model="whisper-1",
        file=file_data
    )

soysaucefor3 · July 18, 2024, 4:04pm

Hi! I am building a mobile app using react-native and test on ios. I face the same issue and I tried using ffmpeg-kit-react-native to convert the recording from the default m4a to mp3. But it still has this issue. Can i know more about how you did your encoding?

johngoodman · July 18, 2024, 6:08pm

What package are you using to record the audio? I’m using react-native-audio-recorder-player and was able to set the encoding in the config for the recorder

soysaucefor3 · July 19, 2024, 4:10am

I am using the expo-av package. It also allows me to set he encoding before recording, but only wav and m4a passed (mp3 has a weird "not supported by ios error). Both wav and m4a does not work with whisper with the “invalid format” error.
I will try the react-native-audio-recorder-player! Hopefully it will work.

johngoodman · July 19, 2024, 4:55am

I had to set the file name to be filename.wav and the AVFormatIDKeyIOS to be ‘wav’. Good luck, hope this helps.

soysaucefor3 · July 19, 2024, 5:53am

I have tried react-native-audio-recorder-player and made it successfully record and play the audio (right before I send it to whisper - you can see the commented-out code). And I have set the file format to wav. However, it still shows me the same invalid format error. Do you mind taking a look at my code? I wonder if I missed some small details as I am pretty new to react-native.

Thanks!

johngoodman · July 19, 2024, 4:39pm

Ah, ya, I forgot some extra steps in the process. The recording code looks good and basically mirrors what I have.

After I record the audio on the frontend, I pass the audio URL to a function that converts it to a blob and sends the data to my node server:

formData.append("file", {
    name: "audio.wav",
    type: 'audio/mpeg',
    uri: audioFileUri
  });

const config = {
    method: 'post',
    url: url,
    headers: { "Content-Type": "multipart/form-data" },
    data : formData,
  };

return axios(config).then(function (response) { 
  // Handle the transcription
});

Then, in my node server, you have to convert the blob to a buffer ( Whisper can take a buffer ):

const express = require('express');
const router = express.Router();

const multer = require('multer');
const storage = multer.memoryStorage();
const upload = multer({ storage: storage }).single('file');

const { OpenAI, toFile } = require("openai");
const { Buffer } = require("buffer");



router.post("/", ( req,res ) => {
  upload(req, res, function (err) {

    const openai = new OpenAI({apiKey: process.env.OPENAI_API_KEY});
    const audioBlob = req.file;

    // You can console.log the audioBlob here and see if the encoding is actual 'wav' and not 'X-m4a'

    const audioAsString = audioBlob.buffer.toString('base64');
    const audioBuffer = Buffer.from(audioAsString, "base64");

    const file = await toFile(audioBuffer, "audio.wav", {contentType: "audio/wav"});

    const payload = {
      model: "whisper-1",
      file: file,
      prompt: defaultPrompt
    };

    openai.audio.transcriptions.create(payload).then((response) => {
      // Handle the response -> response.text is the transcription
    });

  });
});

soysaucefor3 · July 20, 2024, 2:15am

johngoodman:

ess. The recording code looks good and basically mirrors what I have.

After I record the audio on the frontend, I pass the audio URL to a function that converts it to a blob and sends the data to my node server:
formData.append("file", {
    name: "audio.wav",

OK! I was also thinking if I need to add a server running in the backend. I realized that this api does not work with react-native well. (And client should not store API key, etc)

Thanks for the sharing the code!

johngoodman · July 21, 2024, 10:03pm

Right, ya, you can’t send the audio file to Whisper form the frontend, so ya, you need a backend server. Let me know if you have any other questions along the way.

Topic		Replies	Views
Whisper API not transcribing audio files coming from an iphone API ios , whisper , javascript	10	2357	December 18, 2024
Calling Whisper API using curl request keeps giving error API whisper	22	17983	February 6, 2024
Whisper API only transcribing first few seconds API whisper	7	3275	December 19, 2023
ERR_NETWORK when calling /v1/audio/transcriptions API API api , whisper , transcribe	2	83	April 5, 2025
How can I send an audio file via Whisper API? API whisper	4	3209	February 19, 2024

Issues with audio files from IOS and the x-m4a format

Related topics