Whisper: problem with audio/mp4 blobs from Safari

Reproduction steps:
Record audio/mp4, codecs=mp4a with Safari media recorder
Send it to whisper (I tried sending it through postman and node.js)

Recognition just cuts off at 1 to 3 words every time even in small recordings (under 10 seconds)

Expected resullts:
Recognition goes up to 25 megabytes of data as Whisper stated

Additional testing:
Playback of sample files (VLC/Apple Music) +
FFmpeg conversion of sample mp4 files to mp3/AAC/ogg(opus) +

I will make a temporal crutch in the backend of the application to FFmpeg files to other codec and format. It’s very frustrating when the product team claims speech recognition functionality in m4a as well as in mp4 formats, but it’s broken. By the way, it was working fine before.

I think I had this problem, mediaRecorder.start(1000)

I posted in other long thread, won’t let me link it here :man_shrugging:

1 Like

mediaRecorder.start(1000) worked for me on MacOS 13.3.1, Safari 16.4. Thank you so much @keizo.

1 Like

[UPDATE: :warning: the solution did work for me. The text below is left for reference]

Hi all,

The mediaRecorder.start(1000) solution didn’t work for me. Not on MacOS Safari and not on Chrome on iOS.

I ended up using this 8-year-old Recorderjs* library. It returns the blob in wav format, which Whisper handles well.

  • Because I can’t include links, please search for mattdiamond/Recorderjs on GitHub.

My bad. The mediaRecorder.start(1000) solution did work. The problem was on the app side, not on the model.