Whisper: problem with audio/mp4 blobs from Safari

Reproduction steps:
Record audio/mp4, codecs=mp4a with Safari media recorder
Send it to whisper (I tried sending it through postman and node.js)

Results:
Recognition just cuts off at 1 to 3 words every time even in small recordings (under 10 seconds)

Expected resullts:
Recognition goes up to 25 megabytes of data as Whisper stated

Additional testing:
Playback of sample files (VLC/Apple Music) +
FFmpeg conversion of sample mp4 files to mp3/AAC/ogg(opus) +

I will make a temporal crutch in the backend of the application to FFmpeg files to other codec and format. It’s very frustrating when the product team claims speech recognition functionality in m4a as well as in mp4 formats, but it’s broken. By the way, it was working fine before.

3 Likes

I think I had this problem, mediaRecorder.start(1000)

I posted in other long thread, won’t let me link it here :man_shrugging:

4 Likes

mediaRecorder.start(1000) worked for me on MacOS 13.3.1, Safari 16.4. Thank you so much @keizo.

1 Like

[UPDATE: :warning: the solution did work for me. The text below is left for reference]

Hi all,

The mediaRecorder.start(1000) solution didn’t work for me. Not on MacOS Safari and not on Chrome on iOS.

I ended up using this 8-year-old Recorderjs* library. It returns the blob in wav format, which Whisper handles well.

  • Because I can’t include links, please search for mattdiamond/Recorderjs on GitHub.

My bad. The mediaRecorder.start(1000) solution did work. The problem was on the app side, not on the model.

1 Like

It worked for me too. I tried other solutions (e.g. using mic-recorder-to-mp3 as described in community[dot]openai[dot]com/whisper-api-only-transcribing-first-few-seconds/457663/7) and mediaRecorder.start(1000) and a little prompt to whisper made my day.

Implementing the solution: 1 Second

Finding the right words to Google: :exploding_head:

Yes this is like the 30th article i have read that was related to my question, and I found the article ID from a different article I barely found

1 Like

Hi everyone,

Does anyone have an idea on how to record audio in iOS iPhone PWA standalone mode? I’m facing an issue where my PWA works well on Safari, but when I add it to the home screen in standalone mode, it asks for permission to access audio. I grant permission, but the audio is not recorded; it just produces a beep sound. Does anyone know how to fix this?

Thanks,
Usama

that worked for me as well. Thank you so much. I took one hour to try other solutions and I found this one which was an immediate fix. Cheers!

But this will record only one second or am I wrong ?

No, I think it has to do with how the audio is chunked.