Whisper issues with mp4 saved by Safari

RiavvioAS · December 6, 2023, 8:04am

hello there, i’m having a weird issue!
I’ve been trying to make a prototype service which uses mediarecorder to record voice on the browser, then uses the python openai client to process that audio with whisper and transcribe it.

the weird part is that the mp4 file generated works perfectly when using a chrome variant browser, while safari (both on mobile and desktop) is unable to be properly processed.

IOS/SAFARI:

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from ‘1_recorded_audio_1701849225592.mp4’:
Metadata:
major_brand : iso5
minor_version : 1
compatible_brands: isomiso5hlsf
creation_time : 2023-12-06T07:53:40.000000Z
Duration: 00:00:04.95, start: 0.000000, bitrate: 188 kb/s
Stream #0:00x1: Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 185 kb/s (default)
Metadata:
creation_time : 2023-12-06T07:53:40.000000Z
handler_name : Core Media Audio
vendor_id : [0][0][0][0]

CHROME

Input #0, matroska,webm, from ‘1_recorded_audio_1701849299527.mp4’:
Metadata:
encoder : Opera
Duration: N/A, start: 0.000000, bitrate: N/A
Stream #0:0(eng): Audio: opus, 48000 Hz, mono, fltp (default)

oh… maybe i got it.

while i was writing the post, i noticed the ffprobe output and it seems that the mediarecorder js library on ios saves the files using the codec AAC

i think this is the main issue, i’d leave the discussion for the community is somebody has the same problem.

Advices are welcomed!

supershaneski · December 6, 2023, 8:11am

if it is possible, process your audio data using ffmpeg in the backend. this will fix the issue.

RiavvioAS · December 11, 2023, 2:22pm

Quick update:

i applied this solution
python has a “subprocess” function that’s it’s able to open subprocesses at the system level
the server i’m using has ffmpeg installed
the use case is to record quick voice notes and turn them into a todolist: the use of ffmpeg is perfectly acceptable.
it works cross-system!

Case closed!

SeccoJones · December 15, 2023, 10:51am

Not for me…same process, with ffmpeg in a python backend but still same 404 error. Which format is ffmpeg converting to?

RiavvioAS · December 16, 2023, 8:53am

I’m using mp3 as target format and it works!

def convert_audio(input_file: str, output_format: str = "mp3"):
    output_file = f"{input_file.rsplit('.', 1)[0]}.{output_format}"
    command = ['ffmpeg', '-i', input_file, output_file]
    process = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    
    #app_logger.info(f"Output ffmpeg: {process.stdout.decode()}")
    #app_logger.error(f"Errori ffmpeg: {process.stderr.decode()}")

    return output_file

SeccoJones · December 16, 2023, 9:06am

Thank you, I will try.
Had to give up yesterday and use another service, very irritating.

Topic		Replies	Views
Whisper api completely wrong for mp4 API whisper	14	3188	December 15, 2023
Whisper: problem with audio/mp4 blobs from Safari Bugs	9	2281	April 27, 2024
Whisper API only transcribing first few seconds API whisper	7	1868	December 19, 2023
Whisper API is not able to transcribe audios created on iOS API api	2	1454	December 15, 2023
Whisper API rejecting MP4 from safari - but works with webm on chrome and edge Bugs whisper	2	386	February 12, 2024

Whisper issues with mp4 saved by Safari

IOS/SAFARI:

CHROME

oh… maybe i got it.

Related Topics