Whisper issues with mp4 saved by Safari

hello there, i’m having a weird issue!
I’ve been trying to make a prototype service which uses mediarecorder to record voice on the browser, then uses the python openai client to process that audio with whisper and transcribe it.

the weird part is that the mp4 file generated works perfectly when using a chrome variant browser, while safari (both on mobile and desktop) is unable to be properly processed.

IOS/SAFARI:

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from ‘1_recorded_audio_1701849225592.mp4’:
Metadata:
major_brand : iso5
minor_version : 1
compatible_brands: isomiso5hlsf
creation_time : 2023-12-06T07:53:40.000000Z
Duration: 00:00:04.95, start: 0.000000, bitrate: 188 kb/s
Stream #0:00x1: Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 185 kb/s (default)
Metadata:
creation_time : 2023-12-06T07:53:40.000000Z
handler_name : Core Media Audio
vendor_id : [0][0][0][0]

CHROME

Input #0, matroska,webm, from ‘1_recorded_audio_1701849299527.mp4’:
Metadata:
encoder : Opera
Duration: N/A, start: 0.000000, bitrate: N/A
Stream #0:0(eng): Audio: opus, 48000 Hz, mono, fltp (default)

oh… maybe i got it.

while i was writing the post, i noticed the ffprobe output and it seems that the mediarecorder js library on ios saves the files using the codec AAC :sweat_smile:

i think this is the main issue, i’d leave the discussion for the community is somebody has the same problem.

Advices are welcomed!

if it is possible, process your audio data using ffmpeg in the backend. this will fix the issue.

1 Like

Quick update:

  • i applied this solution
  • python has a “subprocess” function that’s it’s able to open subprocesses at the system level
  • the server i’m using has ffmpeg installed
  • the use case is to record quick voice notes and turn them into a todolist: the use of ffmpeg is perfectly acceptable.
  • it works cross-system!

Case closed!

Not for me…same process, with ffmpeg in a python backend but still same 404 error. Which format is ffmpeg converting to?

I’m using mp3 as target format and it works!

def convert_audio(input_file: str, output_format: str = "mp3"):
    output_file = f"{input_file.rsplit('.', 1)[0]}.{output_format}"
    command = ['ffmpeg', '-i', input_file, output_file]
    process = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    
    #app_logger.info(f"Output ffmpeg: {process.stdout.decode()}")
    #app_logger.error(f"Errori ffmpeg: {process.stderr.decode()}")

    return output_file
1 Like

Thank you, I will try.
Had to give up yesterday and use another service, very irritating.