Reproduction steps:
Record audio/mp4, codecs=mp4a with Safari media recorder
Send it to whisper (I tried sending it through postman and node.js)
Results:
Recognition just cuts off at 1 to 3 words every time even in small recordings (under 10 seconds)
Expected resullts:
Recognition goes up to 25 megabytes of data as Whisper stated
Additional testing:
Playback of sample files (VLC/Apple Music) +
FFmpeg conversion of sample mp4 files to mp3/AAC/ogg(opus) +
I will make a temporal crutch in the backend of the application to FFmpeg files to other codec and format. It’s very frustrating when the product team claims speech recognition functionality in m4a as well as in mp4 formats, but it’s broken. By the way, it was working fine before.
It worked for me too. I tried other solutions (e.g. using mic-recorder-to-mp3 as described in community[dot]openai[dot]com/whisper-api-only-transcribing-first-few-seconds/457663/7) and mediaRecorder.start(1000) and a little prompt to whisper made my day.
Does anyone have an idea on how to record audio in iOS iPhone PWA standalone mode? I’m facing an issue where my PWA works well on Safari, but when I add it to the home screen in standalone mode, it asks for permission to access audio. I grant permission, but the audio is not recorded; it just produces a beep sound. Does anyone know how to fix this?