Whisper - opaque charges?

daraj · February 28, 2024, 1:13am

I am using Whisper, and from my calculations, I’m being overcharged quite a bit (about 25% more than what I am sending). I noticed this and then I had an idea - I sped up the files using ffmpeg before I sent them to the API. Not sure if this is explicitly allowed, but I scoured the ToS and could find nothing prohibiting it. Anyway, it was only a test with a lowish volume of audio. The transcription accuracy is almost the same with a 2x speedup of the input file but, astonishingly, I am being charged the same as if I didn’t speed them up.

We get no indication of pricing back from the API, only tokens returned. Does the Whisper API actually charge on a per-token basis instead of a minutely basis? Is there any visibility on this?

anon22939549 · February 28, 2024, 3:26am

Whisper is $0.006 / minute (rounded to the nearest second)

So, that’s $0.0001 / second.

If you get the same accuracy at 2x speed, I suppose that’s a clever way to cut your costs in half.

Hell, cut all your audio down to ~10s chunks, ensuring the length is such that it always rounds down. Don’t forget to skim through and trim the silence between words.

Could probably get your costs down to about 1/3 of what they would normally be.

daraj · February 28, 2024, 3:37am

That’s my point though. I’m sending in half the audio and i’m still getting overcharged more than if they were at 1x speed.

anon22939549 · February 28, 2024, 3:41am

I’d need to see more evidence to verify that.

daraj · February 28, 2024, 3:53am

How can I even prove it if I can’t find what I’m being charged for a request? There is no itemized bill available.

_j · February 28, 2024, 4:38am

It is pretty apparent from the audio quality drop that a request with a different speed does not change the way audio is generated by AI, that the audio is just passed through a time-slicing pitch/tempo changer to get to a new speed without pitch alteration.

daraj · February 28, 2024, 1:10pm

Pitch isn’t altered in the files I send in. Think speeding up like a voice note in whatsapp. So, you’re saying that OpenAI charges essentially by token then and not by second as stated in its docs?

_j · February 28, 2024, 1:27pm

Sorry, I was thinking of the text-to-speech output, which uses overlapping time-slicing to lengthen or shorten the output audio you receive, with artifacts.

Speeding audio up seems a decent way to save some money, but I would compromise between allowing some increased pitch so there is less choppy time slicing going on.

I investigated the price before. And just did again weeks ago. Send exactly an hour, get billed for an hour. ($0.006 / minute x 60 minutes = $0.36)

The request powering the bargraph for cost, in cents, with the precision to fractions of a cent:

Note that these are combined with other requests for a UTC date cutoff, and may have a delay in showing up.

sps · February 28, 2024, 7:32pm

How are you speeding up the audio with ffmpeg?

Can you share a sped up sample?

daraj · February 28, 2024, 8:06pm

Sure, I am speeding the files up with the ffmpeg library in node:

  ffmpeg()
    .input(inputFilePath)
    .audioFilter(`atempo=2`)
    .on("end", () => callback(null, outputFilePath))
    .on("error", (err) => callback(err))
    .save(outputFilePath);

Which I believe is the same as this command:

ffmpeg -i inputFilePath -filter:a "atempo=2" outputFilePath

Here’s the before file:

And the sped up file, which gets sent to Whisper:

Transcript:

This is a test file I’m going to record to test Whisper’s charges, um, I don’t know what to say, but I’ll, uh, just say what I see. Actually, you know what? I’m gonna get a bottle of water, and I’m gonna open the water, I’m gonna sit back down, um, yeah, should we go for a minute? Yeah, I’ll go for exactly a minute, see how that works out. It’ll be an M4A file, I believe. Um, I am looking at the clock, oh, it’s one minute past eight, so that means I’m late, but it’s okay. Um, yeah, we have eight seconds left, uh, hopefully I stopped exactly on time, and I’m gonna finish now.

sps · February 29, 2024, 1:59am

I tested the 2x sped-up file provided by you for transcription and I can confirm that the API bills only for the duration of the sped-up file(rounded to the nearest second).

Here are the details of the original file:

File Type: MP3
Duration: 60.10 seconds
Sample Rate: 48000 Hz
Channels: 1 (Mono)
Bit Rate: 128.069 kbps

The sped-up file has the following:

File Type: MP3
Duration: 30.05 seconds
Sample Rate: 48000 Hz
Channels: 1 (Mono)
Bit Rate: 64.088 kbps

The 2x sped-up file provided by you was the only one I transcribed today and here’s the screenshot of usage:

Hence if you upload the 2x sped-up file, you’ll only be billed for half the duration.

daraj · February 29, 2024, 2:13am

Thank you very much for testing that for me. I have looked extensively at my code and tested what is being sent to Whisper, but I must be making a mistake somewhere. Thanks again.

vb · October 28, 2024, 7:00am

This topic was automatically closed after 16 hours. New replies are no longer allowed.

Topic		Replies	Views
API model whisper - Real cost API	19	61954	December 28, 2023
Whisper API Limits - Transcriptions API whisper	11	14021	December 18, 2023
New TTS API pricing and gotchas API	8	480	March 25, 2025
Confusion Between Per-Minute Audio Pricing vs. Token-Based Audio Pricing API realtime	3	2870	December 30, 2024
Audio Model Pricing is Unclear Documentation api	2	344	March 22, 2025

Whisper - opaque charges?

Related topics