Hi, I have a web app in Nuxt 3 and the backend is in Fast API.
I tried from all the browser to record and send the audio blob from Nuxt to the Fast API endpoint which is taking in the blob, creates the temp file and feed it to whisper API. Interestingly it works for every browser except Safari on iPhones.
Every time I make a call from the Safari browser on iPhone, I get this error
Nothing remarkable there except the duplicated moov atom (Safari bug?), so I fired up a hex editor and removed the extra moov atom. New file doesn’t have the warning, plays fine in any player, and still gives me the same error.
Copying the stream to a new file ffmpeg -i buffer.mp4 -c copy test.mp4 gives a file that works with the transcription API just fine, which leads me to conclude that something minor about Safari’s container packaging is tripping the Whisper API, but… why? whatever it is, it seems like it is not invalid.
I’d really rather not have to run my recordings through ffmpeg before submission
Apologies if this is an unhelpful comment, audio is not my domain - but is it related to the codec? When I record through chrome, I get codec = opus. When I record through safari, I get codec = aac. Chrome works, safari does not.
I thought that may be the case, but aac generated by anything other than Safari also works. In fact the stream copy experiment only changes the metadata in the file, not the coded audio, and that works too.
This pipes the audio coming from safari into ffmpeg, and pipes the output of ffmpeg back into a buffer, without touching disk, and without transcoding. This is the fastest way I can think of.
The issue with piping is that ffmpeg has to do it in one swoop. Can’t write most of the file then go back to header to update it, so can’t have a moov atom. Other more natural formats than -f ipod work too if you drop the moov atom, but there seems to be a huge performance penalty. The API takes up to 30% more time to process them.
I can’t figure out how to get the Whisper API to accept the mp4 produced by Safari using the HTML5 MediaRecorder API
I am trying to use the MediaRecorder HTML5 API to record audio from the users microphone and then send it to Whisper. The mp4 file that Safari produces is rejected by the Whisper API. If I convert this file to mp3, it works fine but I need to avoid this step.
Thanks all for the comments. I tried all the possible ways but still, it doesn’t work. Tried mp3, wav, mp4 formats, but no luck. Personally, I feel it is an API issue because the audio is recorded and played but when it is sent to Whisper API it doesn’t recognise it.
The work around I am currently using until OpenAI fixes their API endpoint, is to load the MediaRecorder polyfil for Safari only:
Even though Safari now fully implements the MediaRecorder API, it is obviously producing MP4 files that OpenAI does not like. By using the polyfill, safari instead produces WAV files that OpenAI is happily accepting.
Of course the ideal solution is for OpenAI to fix their API, but for now this works. The downsides are that you have to load the polyfill (it’s quite small though) and the resulting WAV files are much larger than MP4/WEBM/Etc.