Is Whisper API really 10x more expensive than self hosted?

I’m trying to make sense of another post on this forum: ”Whisper API costs 10x more than hosting an VM?” (I’m not allowed to link it)

From my tests, inference using both the OpenAI Whisper API and self hosting “insanely fast whisper” on a 4090 is taking roughly 2 minutes for an hour of speech.

Whisper API costs $0,36/h and you can rent a 4090 (spot instance) on runpod for $0,39/h.

I feel like I’m missing something, but to me OpenAI seems to have a very competitive pricing? :thinking:

One thing you’ve overlooked is that Whisper on OpenAI generates much faster than the length of the audio. That hour can be done in a minute.

You can also dispatch dozens of transcription tasks at the same time, lighting up a lot of datacenter to get you many hours back in a minute, something your $1000 GPU can’t do.

The price of not transcribing constantly on API is $0.00.

So there are many cost transition points to consider for your personal answer, even the time factor of making open source initially work. Math is fun.


Hi! I think you might have misunderstood what I meant, and I should have been more clear.

From my tests both OpenAI and a 4090 can transcribe 1 hour of audio in about 2 minutes, and they cost about the same ($0,39/h vs $0,36/h).

So my question is more related to the other post, where the author claims that they get a 20x cost benefit from running on a 4090, where to me it seems more like a 1 to 1 relationship :man_shrugging:

It’s always a tradeoff

how long does it take you to set up a cloud instance? How long does it take you to set it up on you machine? how long does it take you to just call the API?

I think it’s geneally understood that most cloud services are significantly more expensive than self hosting.

edit: here’s the link: Whisper API costs 10x more than hosting an VM?