Issues with audio files from IOS and the x-m4a format

I have a node server that accepts audio files from a web app ( built in React ) and a mobile app ( built in React Native ). The audio file is a blob format. The node server transcribes the audio with Whisper.

Blobs that come in from the web work great and are transcribed as expected. But the audio files that come from the IOS return the error:

Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']

The only difference I can tell between the blob from the web app and the blob from the IOS app is that the web app blob’s mimetype is audio/m4a and the IOS app blob’s mimetype is audio/x-m4a.

In the node app, I convert the blob to a buffer then to a file and send that to Whisper. Here’s that code:

const audioAsString = audioBlob.buffer.toString('base64');
const audioBuffer = Buffer.from(audioAsString, "base64");

const file = await toFile(audioBuffer, "audio.wav", {contentType: "audio/wav"});

const payload = {
    model: "whisper-1",
    file: file
  };

openai.audio.transcriptions.create(payload).then((response) => {
  // More logic 
});

I’ve tried converting the IOS app’s blob to different formats, but I still get the same error from Whisper. Any help figuring out how to use a blob from the IOS app would be appriciated.

Without some sample files there isn’t any way for me to know for sure (not an Apple user), but if I had to make my overly biased guess I would guess it’s some form of Apple knowing better than everyone else, doing something non-standard and not caring if it breaks anything outside the Apple ecosystem.

You can try transcoding the iOS audio files to some other acceptable format using ffmpeg with something like,

ffmpeg -i input_file.m4a -codec:a libmp3lame -qscale:a 2 output_file.mp3

If whisper accepts it after transcoding your know it’s some weird Apple thing and you can either try to dig into it further or just deal with the transcoding.

1 Like

Going through the same thing at the moment. I’m pretty sure this is a bug on the part of Apple because I saved a .m4a to my server and piped that into transcriptions.create directly with no problems.

1 Like

Hey @mail44 I was actually able to fix this by changing how the file was encoded within my mobile app. I might be able to help you out if this route would work for you too. Feel free to give me a ping.

The solution for me ended up being to change the encoding of the audio file within my mobile app. I was able to encode the file in the ā€˜wav’ format.

Hey! I don’t think this forum supports DMs? At least I don’t see it. Can you email me at [my email]? I would greatly appreciate it!

I don’t think I ran into issues with the x-m4a format? I did something like this:

    contents = await file.read()

    file_like = io.BytesIO(contents)

    file_data = (filename, file_like.read(), content_type)

    transcription = client.audio.transcriptions.create(
        model="whisper-1",
        file=file_data
    )

Hi! I am building a mobile app using react-native and test on ios. I face the same issue and I tried using ffmpeg-kit-react-native to convert the recording from the default m4a to mp3. But it still has this issue. Can i know more about how you did your encoding?

What package are you using to record the audio? I’m using react-native-audio-recorder-player and was able to set the encoding in the config for the recorder

I am using the expo-av package. It also allows me to set he encoding before recording, but only wav and m4a passed (mp3 has a weird "not supported by ios error). Both wav and m4a does not work with whisper with the ā€œinvalid formatā€ error.
I will try the react-native-audio-recorder-player! Hopefully it will work.

I had to set the file name to be filename.wav and the AVFormatIDKeyIOS to be ā€˜wav’. Good luck, hope this helps.

I have tried react-native-audio-recorder-player and made it successfully record and play the audio (right before I send it to whisper - you can see the commented-out code). And I have set the file format to wav. However, it still shows me the same invalid format error. Do you mind taking a look at my code? I wonder if I missed some small details as I am pretty new to react-native.

Thanks!


Screenshot 2024-07-19 at 1.49.15 PM

Ah, ya, I forgot some extra steps in the process. The recording code looks good and basically mirrors what I have.

After I record the audio on the frontend, I pass the audio URL to a function that converts it to a blob and sends the data to my node server:

formData.append("file", {
    name: "audio.wav",
    type: 'audio/mpeg',
    uri: audioFileUri
  });

const config = {
    method: 'post',
    url: url,
    headers: { "Content-Type": "multipart/form-data" },
    data : formData,
  };

return axios(config).then(function (response) { 
  // Handle the transcription
});

Then, in my node server, you have to convert the blob to a buffer ( Whisper can take a buffer ):

const express = require('express');
const router = express.Router();

const multer = require('multer');
const storage = multer.memoryStorage();
const upload = multer({ storage: storage }).single('file');

const { OpenAI, toFile } = require("openai");
const { Buffer } = require("buffer");



router.post("/", ( req,res ) => {
  upload(req, res, function (err) {

    const openai = new OpenAI({apiKey: process.env.OPENAI_API_KEY});
    const audioBlob = req.file;

    // You can console.log the audioBlob here and see if the encoding is actual 'wav' and not 'X-m4a'

    const audioAsString = audioBlob.buffer.toString('base64');
    const audioBuffer = Buffer.from(audioAsString, "base64");

    const file = await toFile(audioBuffer, "audio.wav", {contentType: "audio/wav"});

    const payload = {
      model: "whisper-1",
      file: file,
      prompt: defaultPrompt
    };

    openai.audio.transcriptions.create(payload).then((response) => {
      // Handle the response -> response.text is the transcription
    });

  });
});

OK! I was also thinking if I need to add a server running in the backend. I realized that this api does not work with react-native well. (And client should not store API key, etc)

Thanks for the sharing the code!

Right, ya, you can’t send the audio file to Whisper form the frontend, so ya, you need a backend server. Let me know if you have any other questions along the way.