Reproduction steps:
Record audio/mp4, codecs=mp4a with Safari media recorder
Send it to whisper (I tried sending it through postman and node.js)
Results:
Recognition just cuts off at 1 to 3 words every time even in small recordings (under 10 seconds)
Expected resullts:
Recognition goes up to 25 megabytes of data as Whisper stated
Additional testing:
Playback of sample files (VLC/Apple Music) +
FFmpeg conversion of sample mp4 files to mp3/AAC/ogg(opus) +
I will make a temporal crutch in the backend of the application to FFmpeg files to other codec and format. It’s very frustrating when the product team claims speech recognition functionality in m4a as well as in mp4 formats, but it’s broken. By the way, it was working fine before.