Whisper doesn't work with mp4

Problem
Whisper doesn’t seem to work with mp4 files. The error message is very unhelpful. There was a thread “whisper-api-completely-wrong-for-mp4/289256” that was closed, but the problem was resolved other than to “not use mp4”.

Request
Please fix mp4 support or remove it as a supported file type from the whisper API.

More Details
What’s weird is that this code works simply changing the file extension to m4a. I think this should be discouraged because as discussed in the above thread, mp4 may have channels besides audio which would make the below break.

  const fileContent = fs.readFileSync('input.mp4');
  fs.writeFileSync('output.m4a', fileContent);

  const transcription = await openai.audio.transcriptions.create({
    file: fs.createReadStream("output.m4a"),
    model: "whisper-1",
  });

  console.log(transcription.text);  

Ideally, I would use ffmpeg or some actual conversion, but I’m currently using serverless and that seems like a total pain to get set up there.

I have an AI I can ask…


It’s possible to demux an MP4 file and extract the first audio stream using Node.js. You can use the mp4box module to achieve this. Here’s a step-by-step guide on how to do it:

  1. Install the mp4box module:
npm install mp4box
  1. Create a script to demux the MP4 file and extract the first audio stream:
const fs = require('fs');
const MP4Box = require('mp4box');

function demuxMP4(inputFile, outputFile) {
  const mp4box = new MP4Box.MP4Box();

  mp4box.onReady = (info) => {
    const audioTrack = info.tracks.find((track) => track.type === 'audio');
    if (!audioTrack) {
      console.error('No audio track found');
      return;
    }

    const output = fs.createWriteStream(outputFile);
    mp4box.setSegmentOptions(audioTrack.id, output, { nbSamples: Infinity });
    mp4box.start();
  };

  const inputStream = fs.createReadStream(inputFile);
  inputStream.pipe(mp4box.createStream());
}

const inputFile = 'input.mp4';
const outputFile = 'output.m4a';

demuxMP4(inputFile, outputFile);

Replace 'input.mp4' and 'output.m4a' with the paths to your input MP4 file and the desired output M4A file, respectively.

This script will demux the input MP4 file, extract the first audio stream, and save it as an M4A file. Note that this script does not remove metadata from the audio stream. If you want to remove metadata, you can use the ffmpeg library with the fluent-ffmpeg wrapper for Node.js.