We recently migrated from Whisper to the new voice-to-text API but encountered significant latency issues and unstable transcription results, frequently experiencing missed text. Due to these challenges, we reverted back to Whisper. Has anyone else experienced similar issues with the new API?
Well… in my experience, it seems that GPT-4o-Transcribe works better than Whisper-1, as it doesn’t try to transcribe background noise or produce an alien-like, broken language. So, for my use, it works well and it is already in prod.
In my experience, the new GPT transcribe models tend to drop words, especially at the beginning/end of the message. I am usually dealing with short messages. Here are my results:
"RECORDING_TRANSCRIPT": {
"gpt-4o-mini-transcribe": "Will this work or not?",
"gpt-4o-transcribe": "Will this work or not?",
"whisper-1": "Uh, will this work or not? I think so. Bye."
}
The whisper version is 100% correct in what was said. You can see the 4o models agree, but chopped off words.
Also don’t forget about latency … whisper is the fastest model out of the three too:
"TRANSCRIPTION_ENGINE": "openai:{'models': ['whisper-1', 'gpt-4o-transcribe', 'gpt-4o-mini-transcribe']}",
"TRANSCRIPT_METADATA": {
"gpt-4o-mini-transcribe": {
"latency_ms": 2016,
"transcribed_at": "2025-04-08T06:13:49.816574"
},
"gpt-4o-transcribe": {
"latency_ms": 1598,
"transcribed_at": "2025-04-08T06:13:47.799742"
},
"whisper-1": {
"latency_ms": 857,
"transcribed_at": "2025-04-08T06:13:46.201050"
}
So, overall, I’m still liking whisper for these short messages.
1 Like