Hi, I’m trying to reproduce the reported gpt-4o-transcribe results on the FLEURS dataset using the transcriptions endpoint. While results are very good, I haven’t been able to match the 2.46% WER on the english subset that was reported in the blog post. I wonder if text normalization might be the …

Hi! Do you mind sharing your script for evaluating the datasets ? I’ve been trying to evaluate on huggingface’s datasets too but im getting high wer with gpt-4o-transcribe and gpt-4o-mini-transcribe but I have good results with whisper-1

Here is a simplified version of my code that you can use to evaluate: import os import tempfile import concurrent.futures from tqdm import tqdm from datasets import load_dataset import openai import evaluate import soundfile as sf from whisper_normalizer.english import EnglishTextNormalizer api_ke…

[截屏2025-03-27 上午11.11.43] This is my WER result on the PLEURS dataset (test set), but on other datasets (multi-speaker meetings and noisy background scenarios), Whisper-1 performs much better than GPT-4o-Transcribe and GPT-4o-Mini-Transcribe.

I also found very poor results on AMI (IHM subset). I didn’t eval on the entire subset, but found WERs above 40% for both gpt-4o-transcribe and gpt-4o-mini-transcribe with english hints. I wanted to figure out whether I was making a mistake with FLEURS before putting too much stock in those findings…

Reproducing gpt-4o-transcribe FLEURS results

API

Steveeeeeeen April 1, 2025, 1:19pm 5

This is what I’ve got on the en subset of fleur datasets. However when benchmarking other dataset such as tedlium or AMI, I am getting really poor results, have you tried on other datasets ?

Topic		Replies	Views
Gpt-4o-mini-transcribe and gpt-4o-transcribe not as good as whisper Feedback api	4	8102	January 10, 2026
Inconsistencies in the Temperature parameter in Transcriptions endpoint Bugs whisper	0	227	March 26, 2025
Whisper-1 joint translation and transcription API	6	3850	October 21, 2024
GPT-4o-transcribe and audio model ready to use via API? API transcribe	10	4034	March 17, 2026
RealTime API Transcription errors Bugs realtime	7	2443	January 9, 2025

Reproducing gpt-4o-transcribe FLEURS results

Related topics