Whisper API costs 10x more than hosting an VM?

anon34024923 · December 17, 2023, 3:43pm

Here is what I’ve found…can someone correct me if I’m wrong here?

on runpod.io hosting a 4090 for an hour is 74 cents.

So a 4090 as far as I understand could transcribe around 186,000 words per hour, so that’s around 20 people an hour for 79 cents…

Referencing the Open AI pricing page Each minute costs $.006.

This works out to $.36 cents an hour for one user. So multiplied by 20 that’s 7 DOLLARS AND 20 CENTS.

That is a whopping roughly 812% price differential. Did OpenAI make a typo? Are they really 800% more expensive than hosting your own version of whisper?

Foxalabs · December 17, 2023, 3:45pm

Yes, I’ve found similar numbers. Not 100% sure why it’s almost a factor of 10 out

jlvanhulst · December 17, 2023, 8:01pm

One thing to consider is that you pay the pod per hour, period. 24/7. So unless you HAVE the streams to run the pod at 20 per hour 24/7 you would get very different numbers per actual minute subscribed?

Example:
Your pod for the month would be $0.74 x 24 x 7 = $532
Or the equivalent of 1,478 hours of audio per month.
So if you have more than 1478 hours per month you’d be cheaper off with the Pod.

anon34024923 · December 17, 2023, 8:15pm

No you don’t, you pay what you use. From the site…

Unless I’ve got something incorrect. It seems this is a pay-as-you-go GPU cloud server that autoscales.

If a 4090 can handle 20 people at $.79 cents an hour, that means that if i had 20 users with the OpenAI API it would be $7.20. Its a massive price difference even at 20 users in an hour

jlvanhulst · December 17, 2023, 8:43pm

That is good to know - I was totally wrong there.
And yes - a big difference.
On the other hand you don’t have to ‘do’ anything other than make the API call in the case of OpenAI.
Competition and options are great! Curious to hear your experience with it!

anon34024923 · December 17, 2023, 8:45pm

I kind of think its actually a typo on their site? I feel like there’s no way they’re actually marking it up that much

jlvanhulst · December 17, 2023, 8:49pm

Well Whisper is their model right - you can’t run it ‘yourself’ on your own GPU’s. So you are comparing Whisper price per minute with ‘some other transcribe model’.

anon34024923 · December 17, 2023, 8:50pm

Whisper is open source.

This is why it makes even less sense.

Put really bluntly: Unless someone corrects me here… It seems you can just run your own whisper model for 10x less the cost of the OpenAI API. Thats a pretty big deal for anyone running a business and heavily using whisper.

(I know I’m sounding like an ad here, I’m not. I’ve more just come upon this discovery and am hoping someone comes and corrects me)

jlvanhulst · December 17, 2023, 9:16pm

It might also be a deliberate strategy - if anything you can see from their ‘load’ that its not like OpenAI has tons on ‘spare GPU’s laying around’ - speech to text is a ‘commodity’ that they do not prioritize? I can see that too in the pricing of commerical products like Fireflies.ai

vb · December 17, 2023, 9:17pm

I recall that OpenAI uses a more efficient algorithm to perform the model tasks (improved beam search performance?).
But for one, I cannot find this in the docs anymore and also this is exactly the type of moment when someone actually tests it and it’s not a 10x difference either.
On the other hand I also recall some heavy users of the API preferring the service over local deployment, so maybe there is something to it.

Either way, you actually have to test it to confirm your calculations.

Ps.
Found it:

We’ve now made the large-v2 model available through our API, which gives convenient on-demand access priced at $0.006 / minute. In addition, our highly-optimized serving stack ensures faster performance compared to other services.

Source: Introducing ChatGPT and Whisper APIs

curt.kennedy · December 17, 2023, 9:53pm

I used to run Whisper through Huggingface. But it was buggy and the HF version was limited to 30 seconds per file. Plus it was also crashing. But I wasn’t sending a ton of data to it.

Since my use case is low-volume, I am more than OK with paying for a stable API that “just works”.

OTOH, if I had to pump massive amounts of data through it, I would look at local or some cloud endpoint to save costs.

You always have to do the math, and see what makes the most sense given your use case.

Similar to why MS charges hourly hosting fees for each fine-tune OpenAI model you have, but charges 50-80% less per token than the OpenAI API. This makes sense for high volume business usage patterns.

anon34024923 · December 17, 2023, 9:54pm

If you look through the whisper repo

It’s basically stating the small and medium models are still great for english and both run 2x and 6x as fast. I was indeed planning on capturing the english speaking market so that being said them saying their model run fast is in reference to whisper-large not whisper-medium or whisper-small.

anon34024923 · December 17, 2023, 9:59pm

10x as much?

Apparently this GPU endpoint gives you access to a 4090 with pay as you go pricing, you don’t pay for what you don’t use. I can’t speak for waht hugging face model you used but , the math points to that you’re basically being charged per user with the API and being charged “in parallel” for the hour with a cloud gpu - to me thats a no brainer for any company that eventually plans to have 20 people using it in one hour

(I use 20 users because apparently thats what a 4090 instance could handle an hour)

Edit: As far as run times - my current software it takes about 2:20 second for an hour upload. It should take about 3 with a 4090. So that’s not a game breaking percent increase, albeit a large one, depending on your use case

curt.kennedy · December 17, 2023, 10:06pm

Logically with the GPU pricing per second, and the 10x cheaper price, you would be right, it doesn’t make sense.

But priority-wise, I would be going from spending $2 per month to $0.20 per month. So while you are correct, I am still not motivated to put in the work to switch over to save a measly $1.80 per month!

Get me to $50-100 per month, now we are talking!

anon34024923 · December 17, 2023, 10:07pm

Well right!! haha

I currently am developing something that deals with long form audio - so for anyone also doing this, its kind of a no brainer

vb · December 17, 2023, 10:10pm

I meant that it will be interesting to compare if Open AI delivers responses faster than just a 4090 due to some improvements in the code.

But there are also some OS repos that try to achieve speed increases, as the model is known to be very good.

anon34024923 · December 17, 2023, 10:12pm

Lets put it this way.

Lets say if one hour on the openAI api is 36 cents for each individual upload.

If you have 1000 people using it and they all make 6 uploads a day. thats 6000 a day.

$.36 x 6000 uploads = $2160 / day

$2160 x 20 working days in a month…

thats $43, 200 dollars a month. using the cloud GPU its $4,300

The difference is so insane I think it has to be an error

edit: wrong math

curt.kennedy · December 17, 2023, 10:22pm

It would most likely be an error if you were high volume and went with 10x API pricing over a hosted cheaper GPU version, like what sounds like your situation.

I think OAI pricing for Whisper is more for low volume folks that just want a simple API interface.

They released the Whisper model a while ago, and I remember at the announcement, there was a segment of the population, like me , that didn’t want to hassle with external hosting, and would rather do an API.

But, since competition has heated up, sure, they should look at their pricing to draw in more of the “heavy user” crowd. If anything, have tiered pricing based on usage, the more you use, the more you save! Right?

It’s all marketing/pricing strategy.

I was happy to see OAI lower their prices on GPT-4-Turbo.

Let the competitive pricing games begin!

TonyAIChamp · December 18, 2023, 2:43am

Isn’t there a cold start problem?

.

dmki · December 24, 2023, 10:44pm

API is limited by audio length and can’t crunch as many formats as on-premise installation. I’m not using GPU clouds, but own servers for audio processing, and noticed huge difference in quality - I think API is using fast model, while on premise I am using the large one. In other words - API is worse.

However, I am using API for transcribing voice messages, as it’s quicker than scheduling processing in own server farms.

Topic		Replies	Views
Is Whisper API really 10x more expensive than self hosted? API whisper	3	6047	January 5, 2024
Smarter to scale with Whisper API or host whisper yourself? API gpt-4 , chatgpt	3	2832	December 16, 2023
TTS API service usability API tts	17	6400	December 16, 2023
I don't understand the pricing for the realtime API API realtime	33	8786	October 8, 2024
Whisper - opaque charges? API whisper	12	1468	October 28, 2024

Whisper API costs 10x more than hosting an VM?

Related topics