Whisper API cannot read files correctly

Phoenix · March 26, 2023, 12:00pm

I did modify the suffix and openai api accepted the input

following guide helped changing the suffix:

def transcribe(audio):
print(audio)

myfile=Path(audio)
myfile=myfile.rename(myfile.with_suffix('.wav'))

audio_file= open(myfile, "rb")

transcript = openai.Audio.transcribe("whisper-1", audio_file)
print(transcript)

doda · March 27, 2023, 12:37am

After an absurd amount of trial and error I’ve found GitHub - kbumsik/opus-media-recorder: MediaRecorder polyfill for Opus recording using WebAssembly which can record webm audio entirely client side and send it to openAI.

itknowles · March 27, 2023, 2:41pm

Having a similar issue with Safari on Mac 12.6.3. Audio from Chrome can be submitted without issue, as long as it is saved first. If I transmit the the blob directly via my Flask app, I get the Invalid file format regardless of whether I use Chrome or Safari. Taking my app to Windows to see if the issue persists.

vojto · April 4, 2023, 8:00am

Can we get an update here from OpenAI?

jordi.planas.ext · April 4, 2023, 4:24pm

Same here,
I want to test Wishper from a databricks notebook, but doesn’t recognize the audio files.
Hopefully we get a fix because I spent quite some time testing different things none of which worked.

AIndieDeveloper · April 5, 2023, 3:36pm

This API is the only thing that works for me. I have tried opus-media-recorder as well but couldn’t get it to work.

GitHub - ai/audio-recorder-polyfill: MediaRecorder polyfill to record audio in Edge and Safari just worked

skycope · April 6, 2023, 11:17am

this is working for me with chrome on mac. the audio is generated from a gradio app microphone:

def transcribe(audio):
    """Transcribe an audio file using OpenAI's Whisper API"""
    # Convert the audio file to MP3 format
    input_audio = audio
    with tempfile.NamedTemporaryFile(suffix=".mp3", delete=False) as temp_mp3:
        output_audio = temp_mp3.name

    # Use ffmpeg to convert the webm file to mp3
    command = f"ffmpeg -y -i {input_audio} -vn -ar 16000 -ac 1 {output_audio}"
    subprocess.call(command, shell=True)
    # Transcribe the audio
    audio_file = open(output_audio, "rb")
    transcription = openai.Audio.transcribe("whisper-1", audio_file).text
    
    # Close and remove the temporary MP3 file
    audio_file.close()
    os.remove(output_audio)
    os.remove(input_audio)

    return transcription

lz.groupchat · April 6, 2023, 11:32am

Tried to change the value of the ‘major brand’ bytes in beginning of ‘ftyp’ box, from [105,115,111,54] to [105,115,111,109] as [105,115,111,109] is also listed in the compatible brand. This is pass the Whisper type check.
As a result, now whisper ai sometimes can return good result, sometimes it return empty string.

aisurfer · April 8, 2023, 10:44am

Same for me, and all this time I thought it was the safari having the trouble. Using pydub to convert the mp4 recorded by safari to .wav works for me

FP · April 22, 2023, 4:56pm

Same problem here, audio generated by Chrome via HTML5 is accepted by OpenAI Whisper, audio generated by Safari not.

Metadata of Safari generated file vs Chrome generated via ffmpeg:

Safari:

[fferr] Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '6444093fc6f2c.wav':
[fferr]   Metadata:
[fferr]     major_brand     : iso5
[fferr]     minor_version   : 1
[fferr]     compatible_brands: isomiso5hlsf
[fferr]     creation_time   : 2023-04-22T16:20:10.000000Z
[fferr]   Duration: 00:00:05.25, start: 0.000000, bitrate: 187 kb/s 0.9980952380952381
fferr]     Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, mono, fltp, 184 kb/s (default)
fferr]     Metadata:
[fferr]       creation_time   : 2023-04-22T16:20:10.000000Z
[fferr]       handler_name    : Core Media Audio
[fferr] Stream mapping:
[fferr]   Stream #0:0 -> #0:0 (aac (native) -> aac (native))

Chrome:

[fferr] Input #0, matroska,webm, from '6443fbc4d28f7.wav':
[fferr]   Metadata:
[fferr]     encoder         : Chrome
[fferr]   Duration: N/A, start: 0.000000, bitrate: N/A
[fferr]     Stream #0:0(eng): Audio: opus, 48000 Hz, mono, fltp (default)
[fferr] Stream mapping:
[fferr]   Stream #0:0 -> #0:0 (opus (native) -> aac (native))

Any insights on solutoin?

FP · April 22, 2023, 5:00pm

Thank you, this solution appears sensible. Still working well for you?

Meaning, OpenAI’s servers on running Unbuntu LTS with the old ffmpeg versions that can’t properly process Safari-generated .wav files?

Experiencing similar issue on Firefox (though haven’t checked type of audio file generated.)

WhyTho · April 23, 2023, 4:30am

Safari doesn’t actually generate wav files. It uses an mp4 encoding. Either way, openAI’s servers don’t like it.

Your options are mostly to re-encode on the client-side with polyfill or on the server-side with something like pydub. There are also other transcription options out there that don’t have this specific problem.

omarsultan · April 25, 2023, 10:54am

I have a very strange issue, up to yesterday OpenAI Whisper API was accepting files recorded from the voice memo on iOS without any issues. Today they’re not working at all and I keep getting file format error and that my file format is not supported.

Could this be something they changed from their side, cuz I didn’t change anything in my code

michellep · April 25, 2023, 4:35pm

Hi all! We did indeed make a change yesterday to try to support more files as described in this thread. Please let me know if that has helped with things!

@omarsultan - sorry to hear about this. I’ve tried out some voice memo files on my end and they all seem to be working - can you give any more details so I can repro? Maybe iOS version number, file extension, etc?

omarsultan · April 25, 2023, 5:07pm

Hey Michelle,

It turns out that what was causing the crash was some preprocessing I was doing to the file because it initially didn’t work before. But not it just works out of the box with Audio Files recorded from the Voice Memos App on iOS 16

Thanks a lot

adityask2194 · April 25, 2023, 5:38pm

Thanks to everyone for the responses. It has started to work for me now after the change in API itself.

daniel13 · April 27, 2023, 4:16pm

I still get this error from you uploading an MP3 - can you advise?
openai.error.InvalidRequestError’>: Invalid file format. Supported formats: [‘m4a’, ‘mp3’, ‘webm’, ‘mp4’, ‘mpga’, ‘wav’, ‘mpeg’

mau · April 28, 2023, 11:22am

I’m the record plugin on a flutter App (record | Flutter Package) to capture audio. The generated audio files where working with the Whisper API until recently. I thought the problem was in how I was sending the request (eg. mime types), but I can now verify with two audio files (Android and iOS) that if I send them with the provided CURL call, the Android generated file goes through, but the iOS generated file says it’s an invalid format (both files are audio/mp4). Is there any way that I can provide the files for testing and improving the API, or do you recommend I transcode the iOS files (as others have done). What’s the best way to approach this?

@michellep thoughts?

mau · May 9, 2023, 12:52pm

I figured it out… Just changed the file ending on iOS to the same as voice notes on iOS (m4a) rather than mp4. Even though the file is of type ‘audio/mp4’. On Android it works with .mp4 file ending. I’m using the record flutter package.

cdmsdashboard · May 15, 2023, 11:09am

Whisper API audio file with .m4a format is working on iOS simulator but not on real device? Any thoughts?

Topic		Replies	Views
[Realtime API] Audio is randomly cutting off at the end Bugs realtime	81	5123	June 16, 2025
Whisper API only transcribing first few seconds API whisper	7	3353	December 19, 2023
[SOLVED] Whisper translates into Welsh API whisper	107	17298	November 25, 2023
Whisper API not transcribing audio files coming from an iphone API ios , whisper , javascript	10	2532	December 18, 2024
Is anyone experiencing WebSocket Realtime Error on Chrome browser? API	77	1186	January 27, 2025

Whisper API cannot read files correctly

Related topics