Whisper (API) significant bug with a specific audio

HenryObj · April 17, 2025, 8:28am

Hey there,

I bumped into a strange situation in which the transcription endpoint returns a very strange output - and a different one for every call - for a specific audio file.

This is the code I use:

with open(audio_chunk_path, "rb") as audio_file:
    transcription_object = client_oai.audio.transcriptions.create(
        model='gpt-4o-transcribe',
        file=audio_file,
        response_format="text"
    )
return transcription_object if isinstance(transcription_object, str) else None

The audio file is below 25MB
The audio is in English
The code works with other audio file from the same channel (so the voice is not the issue)

Here is the link to the audio to reproduce the bug:

As for the output, below are some snippets of 3 different runs:

“Sure, here is a detailed and comprehensive list of potential risks and complications associated with a surgical procedure to remove a tumor, a list of typically needed supplies, and relevant instructions for the patient…”
" Full transcription complete for: b-NRkGbkLOY.mp3
Certainly! Here is a potential plan for your Layered Platform Architecture (LPA) project, designed to create a sophisticated and reliable platform to support your novel interpretation of data…"
" Full transcription complete for: b-NRkGbkLOY.mp3
Certainly, here is the modified syllabus with each item on a separate line and the duration specified in hours and minutes:

Syllabus:

Introduction to Open-Source Software (1h 30m)
Understanding the Open-Source Community (1h 30m)…"

It would be great to have someone explain what is going on.
@OpenAI_Support

To prevent having such output pollute the prod env, we can add a security layer. Ex: post-processing checking the coherence and using another model (ex: Deepgram) for transcription if major issue like this one. But that reduces overall efficiency.

HenryObj · April 17, 2025, 8:35am

Update: it appears the issue is with the audio size. Although it is below the claimed limit of 25MB, I suspect that the safe limit is below as when chunking into two audio files, the transcript did reflect the audio content:

Starting transcription process for: b-NRkGbkLOY.mp3
Audio file size: 21.88 MB
Audio exceeds 20.0MB limit, attempting to chunk…
Splitting into ~2 chunks…
Exported chunk 1: 0.0s - 1200.0s
Exported chunk 2: 1195.0s - 1434.1s
Processing 2 chunk(s) for b-NRkGbkLOY.mp3…
Processing chunk 1/2 (chunk_1_b-NRkGbkLOY.mp3)…
Transcribing chunk: chunk_1_b-NRkGbkLOY.mp3…
Processing chunk 2/2 (chunk_2_b-NRkGbkLOY.mp3)…
Transcribing chunk: chunk_2_b-NRkGbkLOY.mp3…
Full transcription complete for: b-NRkGbkLOY.mp3
Let’s get into the executive summary here. Like I said, I think this is the most important around the horn that we’ve done as a firm and may ever do as a firm. So grab your popcorn. This week’s key macr…

=> Suggest to use 20MB as limit but would be great to get some clarification from Open AI on this matter.

Topic		Replies	Views
Whisper API fails on "large" ogg files (still below 25MB) Bugs whisper	2	1276	April 15, 2024
Gpt-4o-transcribe truncates output after ~8-9 minutes even on short segments Bugs transcribe	3	399	August 29, 2025
Issue with speech-to-text MP3 size API whisper	6	1131	April 26, 2024
Audio file might be corrupted or unsupported Bugs api	1	626	May 20, 2025
Whisper ASR Model Skipping Chunks in Audio Transcription Community whisper , transcribe	1	764	May 20, 2025

Whisper (API) significant bug with a specific audio

Related topics