Can't use mp3 on whisper model

I’m using the api from ai speech to transcribe files (speech-to-text). When I use .wav it works, but when I try to use mp3 I get “Transcription failed: The recordings URI contains invalid data” - I tried to use different mp3 files but I always get the same - but not with wav
Why cant I use mp3? Do I have to activate something to be able to use mp3?

MP3 is one of the inputs that IS accepted.

I can see a few possibilities:

  • the file is corrupted, not starting on an mp3 frame
  • the file starts with poor or broken id3v1 tag
  • the sample rate or number of channels isn’t supported.

These are all possibilities if the mp3 is just existing “found audio” instead of encoded by you.

In Windows, a free tool you might experiment with is mp3directcut - it edits an mp3 without re-encoding. Using that tool, you could chop the start and end off an mp3, ensuring that it starts at a frame boundary.

The utility can check for “resyncs” - problems in the stream and also shows the bitrate, sample rate, channels, and you can see if ID3 tags are present.

The API has also been refusing files over 15MB or so, contrary to documentation of 25MB, but the error should say “too big”.

A Python code snippet that works dandy right now on mp3 - or any complaint file:

from openai import OpenAI
client = OpenAI()

input_file_path = "joke.mp3"
output_file_path = input_file_path + "-transcript.txt"

# Open the audio file
with open(input_file_path, "rb") as audio_file:
    # Create a transcription using OpenAI API
    try:
        transcription = client.audio.transcriptions.create(
            file=audio_file, # a flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm
            language="en",   # ISO code
            model="whisper-1",
            prompt="Welcome to our radio show.",  # lead-up to input audio
            response_format="json",  # also text, srt, verbose_json, or vtt
            temperature=0.2)
    except Exception as e:
        print(f"An API error occurred: {e}")

# get just the transcribed text out of the response
transcribed_text = transcription.text
bonus code for saving/showing
# Save the transcribed text to a file
try:
    with open(output_file_path, "w") as file:
        file.write(transcribed_text)
    print(f"--- Transcribed text successfully saved to '{output_file_path}'.")
except Exception as e:
    print(f"output file error: {e}")

# a function to print an excerpt of a string
def elide_text(text, start=240, end=240, ellipsis='\n...\n'):
    if len(text) <= start + end:
        return text
    return text[:start] + ellipsis + text[-end:]

# print transcription to confirm success
print(elide_text(transcribed_text))
1 Like

Thank you! Tried what you said, still did not work. Checked my diarization variable and I had it on. The problem was not on mp3, it was that stereo audio and diarization are not compatible!

Still, thank you for you help!