Whisper doesn't work with mp4

bscue · February 10, 2024, 7:21pm

Problem
Whisper doesn’t seem to work with mp4 files. The error message is very unhelpful. There was a thread “whisper-api-completely-wrong-for-mp4/289256” that was closed, but the problem was resolved other than to “not use mp4”.

Request
Please fix mp4 support or remove it as a supported file type from the whisper API.

More Details
What’s weird is that this code works simply changing the file extension to m4a. I think this should be discouraged because as discussed in the above thread, mp4 may have channels besides audio which would make the below break.

  const fileContent = fs.readFileSync('input.mp4');
  fs.writeFileSync('output.m4a', fileContent);

  const transcription = await openai.audio.transcriptions.create({
    file: fs.createReadStream("output.m4a"),
    model: "whisper-1",
  });

  console.log(transcription.text);

Ideally, I would use ffmpeg or some actual conversion, but I’m currently using serverless and that seems like a total pain to get set up there.

_j · February 10, 2024, 7:35pm

I have an AI I can ask…

It’s possible to demux an MP4 file and extract the first audio stream using Node.js. You can use the mp4box module to achieve this. Here’s a step-by-step guide on how to do it:

Install the mp4box module:

npm install mp4box

Create a script to demux the MP4 file and extract the first audio stream:

const fs = require('fs');
const MP4Box = require('mp4box');

function demuxMP4(inputFile, outputFile) {
  const mp4box = new MP4Box.MP4Box();

  mp4box.onReady = (info) => {
    const audioTrack = info.tracks.find((track) => track.type === 'audio');
    if (!audioTrack) {
      console.error('No audio track found');
      return;
    }

    const output = fs.createWriteStream(outputFile);
    mp4box.setSegmentOptions(audioTrack.id, output, { nbSamples: Infinity });
    mp4box.start();
  };

  const inputStream = fs.createReadStream(inputFile);
  inputStream.pipe(mp4box.createStream());
}

const inputFile = 'input.mp4';
const outputFile = 'output.m4a';

demuxMP4(inputFile, outputFile);

Replace 'input.mp4' and 'output.m4a' with the paths to your input MP4 file and the desired output M4A file, respectively.

This script will demux the input MP4 file, extract the first audio stream, and save it as an M4A file. Note that this script does not remove metadata from the audio stream. If you want to remove metadata, you can use the ffmpeg library with the fluent-ffmpeg wrapper for Node.js.

info14 · April 24, 2024, 9:31am

I have the same issue here. My use case is trying to send a audio file from a iOS browser (as the “.webm” mime-type is not supported) to the server.

navid_dev · May 29, 2024, 3:30am

I am having the same issue here. mp4 works on Safari and whisper does not work well with mp4. Here are the details of the recording on Safari: “Blob size: 88068 bytes, type: audio/mp4, format: mp4”

Topic		Replies	Views
Whisper api completely wrong for mp4 API whisper	14	5273	December 15, 2023
WhisperAI API Not Recognizing Valid File Formats API whisper	5	4655	December 15, 2023
When attempting to transcribe mp3 with whisper api i get error saying file need to be mp3? API	1	473	May 30, 2024
Wisper API not recognizing .m4a file format API	5	9686	July 24, 2023
Has the Whisper Error Been Solved? API whisper , error	5	8410	January 12, 2024

Whisper doesn't work with mp4

Related topics