This works out to $.36 cents an hour for one user. So multiplied by 20 that’s 7 DOLLARS AND 20 CENTS.
That is a whopping roughly 812% price differential. Did OpenAI make a typo? Are they really 800% more expensive than hosting your own version of whisper?
One thing to consider is that you pay the pod per hour, period. 24/7. So unless you HAVE the streams to run the pod at 20 per hour 24/7 you would get very different numbers per actual minute subscribed?
Example:
Your pod for the month would be $0.74 x 24 x 7 = $532
Or the equivalent of 1,478 hours of audio per month.
So if you have more than 1478 hours per month you’d be cheaper off with the Pod.
No you don’t, you pay what you use. From the site…
Unless I’ve got something incorrect. It seems this is a pay-as-you-go GPU cloud server that autoscales.
If a 4090 can handle 20 people at $.79 cents an hour, that means that if i had 20 users with the OpenAI API it would be $7.20. Its a massive price difference even at 20 users in an hour
That is good to know - I was totally wrong there.
And yes - a big difference.
On the other hand you don’t have to ‘do’ anything other than make the API call in the case of OpenAI.
Competition and options are great! Curious to hear your experience with it!
Well Whisper is their model right - you can’t run it ‘yourself’ on your own GPU’s. So you are comparing Whisper price per minute with ‘some other transcribe model’.
Put really bluntly: Unless someone corrects me here… It seems you can just run your own whisper model for 10x less the cost of the OpenAI API. Thats a pretty big deal for anyone running a business and heavily using whisper.
(I know I’m sounding like an ad here, I’m not. I’ve more just come upon this discovery and am hoping someone comes and corrects me)
It might also be a deliberate strategy - if anything you can see from their ‘load’ that its not like OpenAI has tons on ‘spare GPU’s laying around’ - speech to text is a ‘commodity’ that they do not prioritize? I can see that too in the pricing of commerical products like Fireflies.ai
I recall that OpenAI uses a more efficient algorithm to perform the model tasks (improved beam search performance?).
But for one, I cannot find this in the docs anymore and also this is exactly the type of moment when someone actually tests it and it’s not a 10x difference either.
On the other hand I also recall some heavy users of the API preferring the service over local deployment, so maybe there is something to it.
Either way, you actually have to test it to confirm your calculations.
Ps.
Found it:
We’ve now made the large-v2 model available through our API, which gives convenient on-demand access priced at $0.006 / minute. In addition, our highly-optimized serving stack ensures faster performance compared to other services.
I used to run Whisper through Huggingface. But it was buggy and the HF version was limited to 30 seconds per file. Plus it was also crashing. But I wasn’t sending a ton of data to it.
Since my use case is low-volume, I am more than OK with paying for a stable API that “just works”.
OTOH, if I had to pump massive amounts of data through it, I would look at local or some cloud endpoint to save costs.
You always have to do the math, and see what makes the most sense given your use case.
Similar to why MS charges hourly hosting fees for each fine-tune OpenAI model you have, but charges 50-80% less per token than the OpenAI API. This makes sense for high volume business usage patterns.
It’s basically stating the small and medium models are still great for english and both run 2x and 6x as fast. I was indeed planning on capturing the english speaking market so that being said them saying their model run fast is in reference to whisper-large not whisper-medium or whisper-small.
Apparently this GPU endpoint gives you access to a 4090 with pay as you go pricing, you don’t pay for what you don’t use. I can’t speak for waht hugging face model you used but , the math points to that you’re basically being charged per user with the API and being charged “in parallel” for the hour with a cloud gpu - to me thats a no brainer for any company that eventually plans to have 20 people using it in one hour
(I use 20 users because apparently thats what a 4090 instance could handle an hour)
Edit: As far as run times - my current software it takes about 2:20 second for an hour upload. It should take about 3 with a 4090. So that’s not a game breaking percent increase, albeit a large one, depending on your use case
Logically with the GPU pricing per second, and the 10x cheaper price, you would be right, it doesn’t make sense.
But priority-wise, I would be going from spending $2 per month to $0.20 per month. So while you are correct, I am still not motivated to put in the work to switch over to save a measly $1.80 per month!
It would most likely be an error if you were high volume and went with 10x API pricing over a hosted cheaper GPU version, like what sounds like your situation.
I think OAI pricing for Whisper is more for low volume folks that just want a simple API interface.
They released the Whisper model a while ago, and I remember at the announcement, there was a segment of the population, like me , that didn’t want to hassle with external hosting, and would rather do an API.
But, since competition has heated up, sure, they should look at their pricing to draw in more of the “heavy user” crowd. If anything, have tiered pricing based on usage, the more you use, the more you save! Right?
It’s all marketing/pricing strategy.
I was happy to see OAI lower their prices on GPT-4-Turbo.
API is limited by audio length and can’t crunch as many formats as on-premise installation. I’m not using GPU clouds, but own servers for audio processing, and noticed huge difference in quality - I think API is using fast model, while on premise I am using the large one. In other words - API is worse.
However, I am using API for transcribing voice messages, as it’s quicker than scheduling processing in own server farms.