Whisper API cannot read files correctly

I did modify the suffix and openai api accepted the input

following guide helped changing the suffix:

def transcribe(audio):
print(audio)

myfile=Path(audio)
myfile=myfile.rename(myfile.with_suffix('.wav'))

audio_file= open(myfile, "rb")

transcript = openai.Audio.transcribe("whisper-1", audio_file)
print(transcript)
1 Like

After an absurd amount of trial and error I’ve found GitHub - kbumsik/opus-media-recorder: MediaRecorder polyfill for Opus recording using WebAssembly which can record webm audio entirely client side and send it to openAI.

2 Likes

Having a similar issue with Safari on Mac 12.6.3. Audio from Chrome can be submitted without issue, as long as it is saved first. If I transmit the the blob directly via my Flask app, I get the Invalid file format regardless of whether I use Chrome or Safari. Taking my app to Windows to see if the issue persists.

Can we get an update here from OpenAI?

2 Likes

Same here,
I want to test Wishper from a databricks notebook, but doesn’t recognize the audio files.
Hopefully we get a fix because I spent quite some time testing different things none of which worked.

This API is the only thing that works for me. I have tried opus-media-recorder as well but couldn’t get it to work.

GitHub - ai/audio-recorder-polyfill: MediaRecorder polyfill to record audio in Edge and Safari just worked

this is working for me with chrome on mac. the audio is generated from a gradio app microphone:

def transcribe(audio):
    """Transcribe an audio file using OpenAI's Whisper API"""
    # Convert the audio file to MP3 format
    input_audio = audio
    with tempfile.NamedTemporaryFile(suffix=".mp3", delete=False) as temp_mp3:
        output_audio = temp_mp3.name

    # Use ffmpeg to convert the webm file to mp3
    command = f"ffmpeg -y -i {input_audio} -vn -ar 16000 -ac 1 {output_audio}"
    subprocess.call(command, shell=True)
    # Transcribe the audio
    audio_file = open(output_audio, "rb")
    transcription = openai.Audio.transcribe("whisper-1", audio_file).text
    
    # Close and remove the temporary MP3 file
    audio_file.close()
    os.remove(output_audio)
    os.remove(input_audio)

    return transcription
1 Like

Tried to change the value of the ‘major brand’ bytes in beginning of ‘ftyp’ box, from [105,115,111,54] to [105,115,111,109] as [105,115,111,109] is also listed in the compatible brand. This is pass the Whisper type check.
As a result, now whisper ai sometimes can return good result, sometimes it return empty string. :frowning:

Same for me, and all this time I thought it was the safari having the trouble. Using pydub to convert the mp4 recorded by safari to .wav works for me

Same problem here, audio generated by Chrome via HTML5 is accepted by OpenAI Whisper, audio generated by Safari not.

Metadata of Safari generated file vs Chrome generated via ffmpeg:

Safari:

[fferr] Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '6444093fc6f2c.wav':
[fferr]   Metadata:
[fferr]     major_brand     : iso5
[fferr]     minor_version   : 1
[fferr]     compatible_brands: isomiso5hlsf
[fferr]     creation_time   : 2023-04-22T16:20:10.000000Z
[fferr]   Duration: 00:00:05.25, start: 0.000000, bitrate: 187 kb/s 0.9980952380952381
fferr]     Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, mono, fltp, 184 kb/s (default)
fferr]     Metadata:
[fferr]       creation_time   : 2023-04-22T16:20:10.000000Z
[fferr]       handler_name    : Core Media Audio
[fferr] Stream mapping:
[fferr]   Stream #0:0 -> #0:0 (aac (native) -> aac (native))

Chrome:

[fferr] Input #0, matroska,webm, from '6443fbc4d28f7.wav':
[fferr]   Metadata:
[fferr]     encoder         : Chrome
[fferr]   Duration: N/A, start: 0.000000, bitrate: N/A
[fferr]     Stream #0:0(eng): Audio: opus, 48000 Hz, mono, fltp (default)
[fferr] Stream mapping:
[fferr]   Stream #0:0 -> #0:0 (opus (native) -> aac (native))

Any insights on solutoin?

Thank you, this solution appears sensible. Still working well for you?

Meaning, OpenAI’s servers on running Unbuntu LTS with the old ffmpeg versions that can’t properly process Safari-generated .wav files?

Experiencing similar issue on Firefox (though haven’t checked type of audio file generated.)

Safari doesn’t actually generate wav files. It uses an mp4 encoding. Either way, openAI’s servers don’t like it.

Your options are mostly to re-encode on the client-side with polyfill or on the server-side with something like pydub. There are also other transcription options out there that don’t have this specific problem.

I have a very strange issue, up to yesterday OpenAI Whisper API was accepting files recorded from the voice memo on iOS without any issues. Today they’re not working at all and I keep getting file format error and that my file format is not supported.

Could this be something they changed from their side, cuz I didn’t change anything in my code

Hi all! We did indeed make a change yesterday to try to support more files as described in this thread. Please let me know if that has helped with things!

@omarsultan - sorry to hear about this. I’ve tried out some voice memo files on my end and they all seem to be working - can you give any more details so I can repro? Maybe iOS version number, file extension, etc?

4 Likes

Hey Michelle,

It turns out that what was causing the crash was some preprocessing I was doing to the file because it initially didn’t work before. But not it just works out of the box with Audio Files recorded from the Voice Memos App on iOS 16

Thanks a lot

1 Like

Thanks to everyone for the responses. It has started to work for me now after the change in API itself.

1 Like

I still get this error from you uploading an MP3 - can you advise?
openai.error.InvalidRequestError’>: Invalid file format. Supported formats: [‘m4a’, ‘mp3’, ‘webm’, ‘mp4’, ‘mpga’, ‘wav’, ‘mpeg’

1 Like

I’m the record plugin on a flutter App (record | Flutter Package) to capture audio. The generated audio files where working with the Whisper API until recently. I thought the problem was in how I was sending the request (eg. mime types), but I can now verify with two audio files (Android and iOS) that if I send them with the provided CURL call, the Android generated file goes through, but the iOS generated file says it’s an invalid format (both files are audio/mp4). Is there any way that I can provide the files for testing and improving the API, or do you recommend I transcode the iOS files (as others have done). What’s the best way to approach this?

@michellep thoughts?

I figured it out… Just changed the file ending on iOS to the same as voice notes on iOS (m4a) rather than mp4. Even though the file is of type ‘audio/mp4’. On Android it works with .mp4 file ending. I’m using the record flutter package.

Whisper API audio file with .m4a format is working on iOS simulator but not on real device? Any thoughts?

1 Like