Whisper API cannot read files correctly

I thought that may be the case, but aac generated by anything other than Safari also works. In fact the stream copy experiment only changes the metadata in the file, not the coded audio, and that works too.

I thought so too that it is codec but using chrome on iPhone for this doesn’t do the trick. However, the app in chrome in any other device apart from iPhone works great.

Does anyone know where I can report this as a bug to OpenAI?

In the meantime, here’s a workaround using ffmpeg:

const { spawn } = require('child_process');
const { Readable } = require('stream');
const { Buffer } = require('buffer');


async function pipecopy(inputBuffer) {
  const ffmpeg = spawn('ffmpeg', [
    '-hide_banner',
    '-i', 'pipe:0',
    '-codec', 'copy',
    '-movflags', 'empty_moov',
    '-f', 'ipod', 
    'pipe:1'
  ]);

  const stream = new Readable();
  stream._read = () => {};

  ffmpeg.stdout.on('data', (data) => {
    stream.push(data);
  });

  ffmpeg.on('exit', (code, signal) => {
    console.log(`ffmpeg process exited with code ${code} and signal ${signal}`);
    stream.push(null)
  });

  ffmpeg.stdin.write(inputBuffer);
  ffmpeg.stdin.end();

  const buffer = await new Promise((resolve, reject) => {
    let chunks = [];
    stream.on('data', (chunk) => {
      chunks.push(chunk);
    });
    stream.on('end', () => {
      resolve(Buffer.concat(chunks));
    });
    stream.on('error', (err) => {
      reject(err);
    });
  });

  return buffer;
}


module.exports = pipecopy

This pipes the audio coming from safari into ffmpeg, and pipes the output of ffmpeg back into a buffer, without touching disk, and without transcoding. This is the fastest way I can think of.

The issue with piping is that ffmpeg has to do it in one swoop. Can’t write most of the file then go back to header to update it, so can’t have a moov atom. Other more natural formats than -f ipod work too if you drop the moov atom, but there seems to be a huge performance penalty. The API takes up to 30% more time to process them.

Same Issue here - with output recorded directly in Logic Pro.

I have the same issue.
Not working only on Safari.

Whole component in a gist if somebody would like to take a look.

WIP :slight_smile: Testing how it can work in React and Next.

same error on my side. Works on chrome but not on safari recorded audio. any tips? chatgpt4 didnt help hehe.

This is effecting me as well.

I can’t figure out how to get the Whisper API to accept the mp4 produced by Safari using the HTML5 MediaRecorder API

I am trying to use the MediaRecorder HTML5 API to record audio from the users microphone and then send it to Whisper. The mp4 file that Safari produces is rejected by the Whisper API. If I convert this file to mp3, it works fine but I need to avoid this step.

Thanks all for the comments. I tried all the possible ways but still, it doesn’t work. Tried mp3, wav, mp4 formats, but no luck. Personally, I feel it is an API issue because the audio is recorded and played but when it is sent to Whisper API it doesn’t recognise it.

1 Like

I’m also facing the same issue when using the MediaRecorder in Safari with MP4s.

The work around I am currently using until OpenAI fixes their API endpoint, is to load the MediaRecorder polyfil for Safari only:

Even though Safari now fully implements the MediaRecorder API, it is obviously producing MP4 files that OpenAI does not like. By using the polyfill, safari instead produces WAV files that OpenAI is happily accepting.

Of course the ideal solution is for OpenAI to fix their API, but for now this works. The downsides are that you have to load the polyfill (it’s quite small though) and the resulting WAV files are much larger than MP4/WEBM/Etc.

1 Like

I’ve been fighting with this problem and I think there are some versions of ffmpeg that don’t work well with the aac created by safari.
Whatever version is on openai’s server might be the root problem.

I suspect this because when I compress files to send to from safari before I send to whisper it works beautifully on our dev server but not production. The only major difference I could find was that they have different versions of ffmpeg installed.

I know whisper uses ffmpeg bc I had it running locally for a while and it’s the most common way to unpack these audio files.

We’re seeing the same issue.

Fix from OpenAI would be ideal. The polyfill looks like the second best solution.

I’m seeing same issue. Hoping it’s fixed on api side. Thank you.

I did some more controlled testing with ffmpeg versions and I just wanted to confirm that older versions cannot handle the m4a created by the web audio api.
Of course, it could be something different altogether on openai’s end but if you’re trying to capture audio from the browser, this problem will likely keep coming up.

Is anyone else experiencing the issue on firefox? I have recordings working on chrome, but not safari or firefox - unclear if that is one issue or two.

Yes, experiencing issues on Firefox with webm as mime type - going to iterate through others…

I got another note on this from the server team. Apparently the Ubuntu LTS release (that 1/3 of the internet runs on) comes packaged with the older version of ffmpeg that doesn’t work with that codec. So, don’t be surprised when you struggle to record audio from iOS for the next year.

Here’s a good resource for working with the client end of the problem but you will still struggle sending the audio to openai:

I did modify the suffix and openai api accepted the input

following guide helped changing the suffix:

def transcribe(audio):
print(audio)

myfile=Path(audio)
myfile=myfile.rename(myfile.with_suffix('.wav'))

audio_file= open(myfile, "rb")

transcript = openai.Audio.transcribe("whisper-1", audio_file)
print(transcript)

After an absurd amount of trial and error I’ve found GitHub - kbumsik/opus-media-recorder: MediaRecorder polyfill for Opus recording using WebAssembly which can record webm audio entirely client side and send it to openAI.

Having a similar issue with Safari on Mac 12.6.3. Audio from Chrome can be submitted without issue, as long as it is saved first. If I transmit the the blob directly via my Flask app, I get the Invalid file format regardless of whether I use Chrome or Safari. Taking my app to Windows to see if the issue persists.