Reproducing gpt-4o-transcribe FLEURS results


This is what I’ve got on the en subset of fleur datasets. However when benchmarking other dataset such as tedlium or AMI, I am getting really poor results, have you tried on other datasets ?