Why Whisper accuracy is lower when using whisper API than using OpenAI API?

jplirani · October 5, 2023, 11:49am

An audio with a speech recording was used for ASR (speech recognition) using OpenAI (openai.Audio.transcribe() method) having a WER of 9%.
The same audio was processed using the Whisper API, using as model whisper-large-v2 (the latest model as stated) , with model.transcribe() method, and the result was a WER of 25% !
What is the difference ? According to the API documentation, OpenAI API Audio uses exactly the same whisper-large-v2 model. Is there any prompt engineering applied to the standard OpenAI API method?
How to reproduce the same model/results from OpenAI API but using the Whisper API?

Thanks in advance for your help.

Cheers,

Joao Paulo Lirani

3rr0r · November 8, 2023, 5:43am

It could be multiple things. The OpenAI API may load the model with different parameters, such as anything that affects the processing or accuracy, may be the main thing. I obviously do not know how exactly they are processed, just a guess.

immortal.discoveries · December 23, 2023, 8:08pm

How do I use whisper-large-v2?

I think I read on their documentation only V1 is available currently. How do you know you are using it?

Can you let me know? Thanks!!

vb · December 23, 2023, 8:17pm

If you call the Speech-to-Text API you will get a return from Whisper v2.

The Audio API provides two speech to text endpoints, transcriptions and translations, based on our state-of-the-art open source large-v2 Whisper model.

https://platform.openai.com/docs/guides/speech-to-text

Topic		Replies	Views
How can I use the new whisper large-v3 model via API? API whisper	3	8431	March 6, 2024
Whisper endpoint doesn't support the latest models? API	4	1787	February 13, 2024
Whisper large-v3 model vs large-v2 model Community whisper	5	24705	November 26, 2024
Whisper-1 joint translation and transcription API	6	3346	October 21, 2024
Whisper hallucinations + dropped sentences: Help? API whisper	3	3584	February 29, 2024

Why Whisper accuracy is lower when using whisper API than using OpenAI API?

Related topics