Problem
Whisper doesn’t seem to work with mp4 files. The error message is very unhelpful. There was a thread “whisper-api-completely-wrong-for-mp4/289256” that was closed, but the problem was resolved other than to “not use mp4”.
Request
Please fix mp4 support or remove it as a supported file type from the whisper API.
More Details
What’s weird is that this code works simply changing the file extension to m4a. I think this should be discouraged because as discussed in the above thread, mp4 may have channels besides audio which would make the below break.
It’s possible to demux an MP4 file and extract the first audio stream using Node.js. You can use the mp4box module to achieve this. Here’s a step-by-step guide on how to do it:
Install the mp4box module:
npm install mp4box
Create a script to demux the MP4 file and extract the first audio stream:
Replace 'input.mp4' and 'output.m4a' with the paths to your input MP4 file and the desired output M4A file, respectively.
This script will demux the input MP4 file, extract the first audio stream, and save it as an M4A file. Note that this script does not remove metadata from the audio stream. If you want to remove metadata, you can use the ffmpeg library with the fluent-ffmpeg wrapper for Node.js.
I am having the same issue here. mp4 works on Safari and whisper does not work well with mp4. Here are the details of the recording on Safari: “Blob size: 88068 bytes, type: audio/mp4, format: mp4”