Can't use mp3 on whisper model

margaridavilacha · April 28, 2024, 10:03pm

I’m using the api from ai speech to transcribe files (speech-to-text). When I use .wav it works, but when I try to use mp3 I get “Transcription failed: The recordings URI contains invalid data” - I tried to use different mp3 files but I always get the same - but not with wav
Why cant I use mp3? Do I have to activate something to be able to use mp3?

_j · April 28, 2024, 11:06pm

MP3 is one of the inputs that IS accepted.

I can see a few possibilities:

the file is corrupted, not starting on an mp3 frame
the file starts with poor or broken id3v1 tag
the sample rate or number of channels isn’t supported.

These are all possibilities if the mp3 is just existing “found audio” instead of encoded by you.

In Windows, a free tool you might experiment with is mp3directcut - it edits an mp3 without re-encoding. Using that tool, you could chop the start and end off an mp3, ensuring that it starts at a frame boundary.

The utility can check for “resyncs” - problems in the stream and also shows the bitrate, sample rate, channels, and you can see if ID3 tags are present.

The API has also been refusing files over 15MB or so, contrary to documentation of 25MB, but the error should say “too big”.

A Python code snippet that works dandy right now on mp3 - or any complaint file:

from openai import OpenAI
client = OpenAI()

input_file_path = "joke.mp3"
output_file_path = input_file_path + "-transcript.txt"

# Open the audio file
with open(input_file_path, "rb") as audio_file:
    # Create a transcription using OpenAI API
    try:
        transcription = client.audio.transcriptions.create(
            file=audio_file, # a flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm
            language="en",   # ISO code
            model="whisper-1",
            prompt="Welcome to our radio show.",  # lead-up to input audio
            response_format="json",  # also text, srt, verbose_json, or vtt
            temperature=0.2)
    except Exception as e:
        print(f"An API error occurred: {e}")

# get just the transcribed text out of the response
transcribed_text = transcription.text

bonus code for saving/showing

# Save the transcribed text to a file
try:
    with open(output_file_path, "w") as file:
        file.write(transcribed_text)
    print(f"--- Transcribed text successfully saved to '{output_file_path}'.")
except Exception as e:
    print(f"output file error: {e}")

# a function to print an excerpt of a string
def elide_text(text, start=240, end=240, ellipsis='\n...\n'):
    if len(text) <= start + end:
        return text
    return text[:start] + ellipsis + text[-end:]

# print transcription to confirm success
print(elide_text(transcribed_text))

margaridavilacha · May 12, 2024, 11:10pm

Thank you! Tried what you said, still did not work. Checked my diarization variable and I had it on. The problem was not on mp3, it was that stereo audio and diarization are not compatible!

Still, thank you for you help!

jesse9 · July 9, 2024, 4:38pm

just curious - which model are you using that has diarization?

Topic		Replies	Views
When attempting to transcribe mp3 with whisper api i get error saying file need to be mp3? API	1	475	May 30, 2024
Has the Whisper Error Been Solved? API whisper , error	5	8422	January 12, 2024
Whisper doesn't work with mp4 API whisper	3	2623	May 29, 2024
Whisper spitting out gibberish when trying to transcribe API whisper	4	1071	June 14, 2024
Whisper api completely wrong for mp4 API whisper	14	5278	December 15, 2023

Can't use mp3 on whisper model

Related topics