I have a node server that accepts audio files from a web app ( built in React ) and a mobile app ( built in React Native ). The audio file is a blob format. The node server transcribes the audio with Whisper.
Blobs that come in from the web work great and are transcribed as expected. But the audio files that come from the IOS return the error:
The only difference I can tell between the blob from the web app and the blob from the IOS app is that the web app blobās mimetype is audio/m4a and the IOS app blobās mimetype is audio/x-m4a.
In the node app, I convert the blob to a buffer then to a file and send that to Whisper. Hereās that code:
Iāve tried converting the IOS appās blob to different formats, but I still get the same error from Whisper. Any help figuring out how to use a blob from the IOS app would be appriciated.
Without some sample files there isnāt any way for me to know for sure (not an Apple user), but if I had to make my overly biased guess I would guess itās some form of Apple knowing better than everyone else, doing something non-standard and not caring if it breaks anything outside the Apple ecosystem.
You can try transcoding the iOS audio files to some other acceptable format using ffmpeg with something like,
If whisper accepts it after transcoding your know itās some weird Apple thing and you can either try to dig into it further or just deal with the transcoding.
Going through the same thing at the moment. Iām pretty sure this is a bug on the part of Apple because I saved a .m4a to my server and piped that into transcriptions.create directly with no problems.
Hey @mail44 I was actually able to fix this by changing how the file was encoded within my mobile app. I might be able to help you out if this route would work for you too. Feel free to give me a ping.
The solution for me ended up being to change the encoding of the audio file within my mobile app. I was able to encode the file in the āwavā format.
Hi! I am building a mobile app using react-native and test on ios. I face the same issue and I tried using ffmpeg-kit-react-native to convert the recording from the default m4a to mp3. But it still has this issue. Can i know more about how you did your encoding?
What package are you using to record the audio? Iām using react-native-audio-recorder-player and was able to set the encoding in the config for the recorder
I am using the expo-av package. It also allows me to set he encoding before recording, but only wav and m4a passed (mp3 has a weird "not supported by ios error). Both wav and m4a does not work with whisper with the āinvalid formatā error.
I will try the react-native-audio-recorder-player! Hopefully it will work.
I have tried react-native-audio-recorder-player and made it successfully record and play the audio (right before I send it to whisper - you can see the commented-out code). And I have set the file format to wav. However, it still shows me the same invalid format error. Do you mind taking a look at my code? I wonder if I missed some small details as I am pretty new to react-native.
OK! I was also thinking if I need to add a server running in the backend. I realized that this api does not work with react-native well. (And client should not store API key, etc)
Right, ya, you canāt send the audio file to Whisper form the frontend, so ya, you need a backend server. Let me know if you have any other questions along the way.