Whisper endpoint doesn't support the latest models?

The OpenAI API docs for audio endpoints state:

model string Required: ID of the model to use. Only whisper-1 is currently available.

Is whisper-1 an older model? Does that mean that I cannot use OpenAI’s API endpoint for newer models such as large-v3? I should look for other cloud providers that host these models?

the docs say that it’s using large-v2


but they might be outdated :confused:

if you do lots of processing, self-hosting will likely run you cheaper as well, so it’s not a bad idea.

whisper-1 is running on large-v2. From what I see on github, v3 is only slightly faster. So it might not make a huge difference for OpenAI to upgrade their systems to v3 just yet.

I agree, if you need v3, self-hosting with a dedicated GPU will make a huge difference if that’s an option for you.

No, the quality of transcription and translation is better than the older version. I use it on Google Colab and its quality is superior. However, why isn’t OpenAI updating Whisper-1 with the larger v3 model?

I think it really depends on the languages you’re targeting and the sophistication of the text of the language.

Benchmarks peg Whisper V2 and V3 are essentially identical for English, slightly better for more Western European languages and substantially better for many large Asian languages.

They almost certainly will either switch the endpoint to the new model or add a new endpoint targeting the newer model… eventually.

But, doing so is a non-trivial undertaking fraught with risk, so it is reasonable for them to choose to proceed with caution.

