Whisper API cannot read files correctly

adityask2194 · March 10, 2023, 10:20pm

Hi, I have a web app in Nuxt 3 and the backend is in Fast API.
I tried from all the browser to record and send the audio blob from Nuxt to the Fast API endpoint which is taking in the blob, creates the temp file and feed it to whisper API. Interestingly it works for every browser except Safari on iPhones.
Every time I make a call from the Safari browser on iPhone, I get this error

openai.error.InvalidRequestError: Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']

on the Front end I am using MediaEncoder to take in the media stream → convert it to blob on recording stopped and send it to the Fast API endpoint.
The mime type I am setting is audio/wav.

While the same thing done from other browsers on other devices, it works perfectly fine. Do I need to do things differently? Am I missing something?

0x41mmar · March 12, 2023, 2:02am

Same here. I thought maybe Safari on iPhone was declaring the wrong format, so I saved the blob and ran it through ffmpeg, here’s what I got:

[mov,mp4,m4a,3gp,3g2,mj2 @ 00000230b38e01c0] Found duplicated MOOV Atom. Skipped it
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'filename':
  Metadata:
    creation_time   : 2023-03-12T01:37:10.000000Z
    major_brand     : iso5
    minor_version   : 1
    compatible_brands: isomiso5hlsf
  Duration: 00:00:06.22, start: 0.000000, bitrate: 211 kb/s
  Stream #0:0[0x1](und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 207 kb/s (default)
    Metadata:
      creation_time   : 2023-03-12T01:37:10.000000Z
      handler_name    : Core Media Audio
      vendor_id       : [0][0][0][0]

Nothing remarkable there except the duplicated moov atom (Safari bug?), so I fired up a hex editor and removed the extra moov atom. New file doesn’t have the warning, plays fine in any player, and still gives me the same error.
Copying the stream to a new file ffmpeg -i buffer.mp4 -c copy test.mp4 gives a file that works with the transcription API just fine, which leads me to conclude that something minor about Safari’s container packaging is tripping the Whisper API, but… why? whatever it is, it seems like it is not invalid.

I’d really rather not have to run my recordings through ffmpeg before submission

DavidTompkins · March 12, 2023, 8:51pm

Apologies if this is an unhelpful comment, audio is not my domain - but is it related to the codec? When I record through chrome, I get codec = opus. When I record through safari, I get codec = aac. Chrome works, safari does not.

0x41mmar · March 13, 2023, 1:00am

I thought that may be the case, but aac generated by anything other than Safari also works. In fact the stream copy experiment only changes the metadata in the file, not the coded audio, and that works too.

adityask2194 · March 13, 2023, 11:31am

I thought so too that it is codec but using chrome on iPhone for this doesn’t do the trick. However, the app in chrome in any other device apart from iPhone works great.

0x41mmar · March 13, 2023, 6:44pm

Does anyone know where I can report this as a bug to OpenAI?

In the meantime, here’s a workaround using ffmpeg:

const { spawn } = require('child_process');
const { Readable } = require('stream');
const { Buffer } = require('buffer');


async function pipecopy(inputBuffer) {
  const ffmpeg = spawn('ffmpeg', [
    '-hide_banner',
    '-i', 'pipe:0',
    '-codec', 'copy',
    '-movflags', 'empty_moov',
    '-f', 'ipod', 
    'pipe:1'
  ]);

  const stream = new Readable();
  stream._read = () => {};

  ffmpeg.stdout.on('data', (data) => {
    stream.push(data);
  });

  ffmpeg.on('exit', (code, signal) => {
    console.log(`ffmpeg process exited with code ${code} and signal ${signal}`);
    stream.push(null)
  });

  ffmpeg.stdin.write(inputBuffer);
  ffmpeg.stdin.end();

  const buffer = await new Promise((resolve, reject) => {
    let chunks = [];
    stream.on('data', (chunk) => {
      chunks.push(chunk);
    });
    stream.on('end', () => {
      resolve(Buffer.concat(chunks));
    });
    stream.on('error', (err) => {
      reject(err);
    });
  });

  return buffer;
}


module.exports = pipecopy

This pipes the audio coming from safari into ffmpeg, and pipes the output of ffmpeg back into a buffer, without touching disk, and without transcoding. This is the fastest way I can think of.

The issue with piping is that ffmpeg has to do it in one swoop. Can’t write most of the file then go back to header to update it, so can’t have a moov atom. Other more natural formats than -f ipod work too if you drop the moov atom, but there seems to be a huge performance penalty. The API takes up to 30% more time to process them.

samhincks · March 14, 2023, 10:12am

Same Issue here - with output recorded directly in Logic Pro.

maciekChmura · March 16, 2023, 9:14am

I have the same issue.
Not working only on Safari.

Whole component in a gist if somebody would like to take a look.

gist.github.com

https://gist.github.com/maciekChmura/a1c9cfb45f32b72ea6c2b4a4dcfe898f

recorder.tsx

import { useState, useRef } from "react";
import { reportError, getErrorMessage } from "~/utils/error";
import axios from "axios";
import { env } from "~/env.mjs";

type RecordingStatus = "inactive" | "recording" | "paused";

const mimeType = "audio/mp3";
const fileName = "recording.mp3";

This file has been truncated. show original

WIP Testing how it can work in React and Next.

jovanxua · March 17, 2023, 11:20pm

same error on my side. Works on chrome but not on safari recorded audio. any tips? chatgpt4 didnt help hehe.

casimir · March 19, 2023, 11:19pm

This is effecting me as well.

I can’t figure out how to get the Whisper API to accept the mp4 produced by Safari using the HTML5 MediaRecorder API

I am trying to use the MediaRecorder HTML5 API to record audio from the users microphone and then send it to Whisper. The mp4 file that Safari produces is rejected by the Whisper API. If I convert this file to mp3, it works fine but I need to avoid this step.

adityask2194 · March 20, 2023, 12:51pm

Thanks all for the comments. I tried all the possible ways but still, it doesn’t work. Tried mp3, wav, mp4 formats, but no luck. Personally, I feel it is an API issue because the audio is recorded and played but when it is sent to Whisper API it doesn’t recognise it.

Jako · March 20, 2023, 12:55pm

I’m also facing the same issue when using the MediaRecorder in Safari with MP4s.

casimir · March 20, 2023, 4:07pm

The work around I am currently using until OpenAI fixes their API endpoint, is to load the MediaRecorder polyfil for Safari only:

Even though Safari now fully implements the MediaRecorder API, it is obviously producing MP4 files that OpenAI does not like. By using the polyfill, safari instead produces WAV files that OpenAI is happily accepting.

Of course the ideal solution is for OpenAI to fix their API, but for now this works. The downsides are that you have to load the polyfill (it’s quite small though) and the resulting WAV files are much larger than MP4/WEBM/Etc.

WhyTho · March 23, 2023, 10:05pm

I’ve been fighting with this problem and I think there are some versions of ffmpeg that don’t work well with the aac created by safari.
Whatever version is on openai’s server might be the root problem.

I suspect this because when I compress files to send to from safari before I send to whisper it works beautifully on our dev server but not production. The only major difference I could find was that they have different versions of ffmpeg installed.

I know whisper uses ffmpeg bc I had it running locally for a while and it’s the most common way to unpack these audio files.

vojto · March 24, 2023, 6:25am

We’re seeing the same issue.

Fix from OpenAI would be ideal. The polyfill looks like the second best solution.

drewlesueur · March 24, 2023, 11:13am

I’m seeing same issue. Hoping it’s fixed on api side. Thank you.

WhyTho · March 24, 2023, 12:37pm

I did some more controlled testing with ffmpeg versions and I just wanted to confirm that older versions cannot handle the m4a created by the web audio api.
Of course, it could be something different altogether on openai’s end but if you’re trying to capture audio from the browser, this problem will likely keep coming up.

DavidTompkins · March 24, 2023, 3:00pm

Is anyone else experiencing the issue on firefox? I have recordings working on chrome, but not safari or firefox - unclear if that is one issue or two.

RePattern · March 24, 2023, 3:04pm

Yes, experiencing issues on Firefox with webm as mime type - going to iterate through others…

WhyTho · March 24, 2023, 6:03pm

I got another note on this from the server team. Apparently the Ubuntu LTS release (that 1/3 of the internet runs on) comes packaged with the older version of ffmpeg that doesn’t work with that codec. So, don’t be surprised when you struggle to record audio from iOS for the next year.

Here’s a good resource for working with the client end of the problem but you will still struggle sending the audio to openai:

Topic		Replies	Views
Whisper API only transcribing first few seconds API whisper	7	3198	December 19, 2023
[SOLVED] Whisper translates into Welsh API whisper	107	16864	November 25, 2023
Whisper API not transcribing audio files coming from an iphone API ios , whisper , javascript	10	2177	December 18, 2024
Is anyone experiencing WebSocket Realtime Error on Chrome browser? API	77	745	January 27, 2025
Whisper api completely wrong for mp4 API whisper	14	5060	December 15, 2023

Whisper API cannot read files correctly

Related topics