Whisper API costs 10x more than hosting an VM?

Here is what I’ve found…can someone correct me if I’m wrong here?

on runpod.io hosting a 4090 for an hour is 74 cents.

So a 4090 as far as I understand could transcribe around 186,000 words per hour, so that’s around 20 people an hour for 79 cents…

Referencing the Open AI pricing page Each minute costs $.006.

This works out to $.36 cents an hour for one user. So multiplied by 20 that’s 7 DOLLARS AND 20 CENTS.

That is a whopping roughly 812% price differential. Did OpenAI make a typo? Are they really 800% more expensive than hosting your own version of whisper?

3 Likes

Yes, I’ve found similar numbers. Not 100% sure why it’s almost a factor of 10 out :thinking:

One thing to consider is that you pay the pod per hour, period. 24/7. So unless you HAVE the streams to run the pod at 20 per hour 24/7 you would get very different numbers per actual minute subscribed?

Example:
Your pod for the month would be $0.74 x 24 x 7 = $532
Or the equivalent of 1,478 hours of audio per month.
So if you have more than 1478 hours per month you’d be cheaper off with the Pod.

1 Like

No you don’t, you pay what you use. From the site…

image

Unless I’ve got something incorrect. It seems this is a pay-as-you-go GPU cloud server that autoscales.

If a 4090 can handle 20 people at $.79 cents an hour, that means that if i had 20 users with the OpenAI API it would be $7.20. Its a massive price difference even at 20 users in an hour

1 Like

That is good to know - I was totally wrong there.
And yes - a big difference.
On the other hand you don’t have to ‘do’ anything other than make the API call in the case of OpenAI.
Competition and options are great! Curious to hear your experience with it!

2 Likes

I kind of think its actually a typo on their site? I feel like there’s no way they’re actually marking it up that much

Well Whisper is their model right - you can’t run it ‘yourself’ on your own GPU’s. So you are comparing Whisper price per minute with ‘some other transcribe model’.

Whisper is open source.

This is why it makes even less sense.

Put really bluntly: Unless someone corrects me here… It seems you can just run your own whisper model for 10x less the cost of the OpenAI API. Thats a pretty big deal for anyone running a business and heavily using whisper.

(I know I’m sounding like an ad here, I’m not. I’ve more just come upon this discovery and am hoping someone comes and corrects me)

1 Like

It might also be a deliberate strategy - if anything you can see from their ‘load’ that its not like OpenAI has tons on ‘spare GPU’s laying around’ - speech to text is a ‘commodity’ that they do not prioritize? I can see that too in the pricing of commerical products like Fireflies.ai

I recall that OpenAI uses a more efficient algorithm to perform the model tasks (improved beam search performance?).
But for one, I cannot find this in the docs anymore and also this is exactly the type of moment when someone actually tests it and it’s not a 10x difference either.
On the other hand I also recall some heavy users of the API preferring the service over local deployment, so maybe there is something to it.

Either way, you actually have to test it to confirm your calculations.

Ps.
Found it:

We’ve now made the large-v2 model available through our API, which gives convenient on-demand access priced at $0.006 / minute. In addition, our highly-optimized serving stack ensures faster performance compared to other services.

Source: Introducing ChatGPT and Whisper APIs

I used to run Whisper through Huggingface. But it was buggy and the HF version was limited to 30 seconds per file. Plus it was also crashing. But I wasn’t sending a ton of data to it.

Since my use case is low-volume, I am more than OK with paying for a stable API that “just works”.

OTOH, if I had to pump massive amounts of data through it, I would look at local or some cloud endpoint to save costs.

You always have to do the math, and see what makes the most sense given your use case.

Similar to why MS charges hourly hosting fees for each fine-tune OpenAI model you have, but charges 50-80% less per token than the OpenAI API. This makes sense for high volume business usage patterns.

If you look through the whisper repo

It’s basically stating the small and medium models are still great for english and both run 2x and 6x as fast. I was indeed planning on capturing the english speaking market so that being said them saying their model run fast is in reference to whisper-large not whisper-medium or whisper-small.

1 Like

10x as much?

Apparently this GPU endpoint gives you access to a 4090 with pay as you go pricing, you don’t pay for what you don’t use. I can’t speak for waht hugging face model you used but , the math points to that you’re basically being charged per user with the API and being charged “in parallel” for the hour with a cloud gpu - to me thats a no brainer for any company that eventually plans to have 20 people using it in one hour

(I use 20 users because apparently thats what a 4090 instance could handle an hour)

Edit: As far as run times - my current software it takes about 2:20 second for an hour upload. It should take about 3 with a 4090. So that’s not a game breaking percent increase, albeit a large one, depending on your use case

1 Like

Logically with the GPU pricing per second, and the 10x cheaper price, you would be right, it doesn’t make sense.

But priority-wise, I would be going from spending $2 per month to $0.20 per month. So while you are correct, I am still not motivated to put in the work to switch over to save a measly $1.80 per month! :rofl:

Get me to $50-100 per month, now we are talking!

1 Like

Well right!! haha

I currently am developing something that deals with long form audio - so for anyone also doing this, its kind of a no brainer

1 Like

I meant that it will be interesting to compare if Open AI delivers responses faster than just a 4090 due to some improvements in the code.

But there are also some OS repos that try to achieve speed increases, as the model is known to be very good.

Lets put it this way.

Lets say if one hour on the openAI api is 36 cents for each individual upload.

If you have 1000 people using it and they all make 6 uploads a day. thats 6000 a day.

$.36 x 6000 uploads = $2160 / day

$2160 x 20 working days in a month…

thats $43, 200 dollars a month. using the cloud GPU its $4,300

The difference is so insane I think it has to be an error

edit: wrong math

1 Like

It would most likely be an error if you were high volume and went with 10x API pricing over a hosted cheaper GPU version, like what sounds like your situation.

I think OAI pricing for Whisper is more for low volume folks that just want a simple API interface.

They released the Whisper model a while ago, and I remember at the announcement, there was a segment of the population, like me :rofl:, that didn’t want to hassle with external hosting, and would rather do an API.

But, since competition has heated up, sure, they should look at their pricing to draw in more of the “heavy user” crowd. If anything, have tiered pricing based on usage, the more you use, the more you save! Right?

It’s all marketing/pricing strategy.

I was happy to see OAI lower their prices on GPT-4-Turbo.

Let the competitive pricing games begin!

Isn’t there a cold start problem?

.

API is limited by audio length and can’t crunch as many formats as on-premise installation. I’m not using GPU clouds, but own servers for audio processing, and noticed huge difference in quality - I think API is using fast model, while on premise I am using the large one. In other words - API is worse.

However, I am using API for transcribing voice messages, as it’s quicker than scheduling processing in own server farms.