Gpt-4o-mini-transcribe and gpt-4o-transcribe not as good as whisper

lizhihomepicture · March 27, 2025, 10:13pm

We recently migrated from Whisper to the new voice-to-text API but encountered significant latency issues and unstable transcription results, frequently experiencing missed text. Due to these challenges, we reverted back to Whisper. Has anyone else experienced similar issues with the new API?

BrianLovesAI · April 3, 2025, 12:51am

Well… in my experience, it seems that GPT-4o-Transcribe works better than Whisper-1, as it doesn’t try to transcribe background noise or produce an alien-like, broken language. So, for my use, it works well and it is already in prod.

curt.kennedy · April 8, 2025, 6:21am

In my experience, the new GPT transcribe models tend to drop words, especially at the beginning/end of the message. I am usually dealing with short messages. Here are my results:

 "RECORDING_TRANSCRIPT": {
  "gpt-4o-mini-transcribe": "Will this work or not?",
  "gpt-4o-transcribe": "Will this work or not?",
  "whisper-1": "Uh, will this work or not? I think so. Bye."
 }

The whisper version is 100% correct in what was said. You can see the 4o models agree, but chopped off words.

Also don’t forget about latency … whisper is the fastest model out of the three too:

 "TRANSCRIPTION_ENGINE": "openai:{'models': ['whisper-1', 'gpt-4o-transcribe', 'gpt-4o-mini-transcribe']}",
 "TRANSCRIPT_METADATA": {
  "gpt-4o-mini-transcribe": {
   "latency_ms": 2016,
   "transcribed_at": "2025-04-08T06:13:49.816574"
  },
  "gpt-4o-transcribe": {
   "latency_ms": 1598,
   "transcribed_at": "2025-04-08T06:13:47.799742"
  },
  "whisper-1": {
   "latency_ms": 857,
   "transcribed_at": "2025-04-08T06:13:46.201050"
  }

So, overall, I’m still liking whisper for these short messages.

michael.barnes · April 23, 2025, 9:18pm

I evaluated for use at my company (we need to transcribe TV Ads) and whisper 1 seemed a lot better than gpt-4o-transcribe for the specific “Edge Case” tests we ran. I don’t think it’s quite up to par for our specific use case but maybe for cases where you’re on the phone in a noisy coffee shop transcribing a work meeting it’s good enough… We need high accuracy and precision because we are searching for key “Banned Words” in Ads like “Superbowl” (amongst other things).

Here’s some screenshots of our evaluations --not necessarily a super exhaustive test and our grading criteria is a little arbitrary but I think it’s enough to make us hold-off a bit on switching to the gpt-4o-transcribe API:

(Nevermind, OpenAI only lets me post a single screenshot lol)

In some cases, huge amounts of the transcription were dropped:

Topic		Replies	Views
GPT-4o-transcribe and audio model ready to use via API? API transcribe	9	1543	July 2, 2025
Whisper hallucinations + dropped sentences: Help? API whisper	3	3692	February 29, 2024
RealTime API Transcription errors Bugs realtime	7	1892	January 9, 2025
Whisper-1 joint translation and transcription API	6	3418	October 21, 2024
Whisper API stutter and erring like LLMs API whisper	1	1193	December 25, 2023

Gpt-4o-mini-transcribe and gpt-4o-transcribe not as good as whisper

Related topics