New TTS API pricing and gotchas

mojave · March 24, 2025, 5:11am

I am not 100% sure on how pricing works, but for gpt-4o-mini-tts:

It’s 60 cents per million tokens in and that includes the text and the instructions.

It’s $12 per million tokens for audio out, which I assume is just the number of input tokens.

It’s about 1.5 cents per minute, although that seems to vary widely for me. On short texts, I get charged at about double that, but on my minute long tests, that was just about spot on.

Big gotcha: sometimes audio generation can go off the rails. The same roughly 1 minute text block I gave it once ran for over 3 minutes, and the last 2.5 minutes were silence. So that was a big screwup in the API. And it charged me of course for 3 minutes of audio.

cdonvd0s · March 24, 2025, 5:23am

Till the time there is a proper solution inbuilt into the API for this, you can preprocess your audio to filter out non speech segments.

oded · March 24, 2025, 7:09pm

Could it be that it’s $0.015 for input tokens (text) and an ADDITIONAL $0.015 for output tokens?

That’s why you’re seeing double the cost sometimes? I’m trying to read through the pricing page and it’s not clear:

https://platform.openai.com/docs/pricing

aprendendo.next · March 24, 2025, 11:33pm

It is an estimated for both.

The text tokens used for input are quite inexpensive to reach $0.015 for just 1 minute of output, it would have to be made with around 25k of instruction tokens.

So basically, most of what you pay is for the output.

mowfoo · March 25, 2025, 4:05am

Anybody tried the quality of transcription?

oded · March 25, 2025, 3:59pm

Thanks for the reply!

Is there a formal source backing this up? Or are you with the company? I just really want to be 100% sure.

Since they write it in two separate lines, I’m missing a clarification somewhere saying the estimation is combined.

Thanks again!

aprendendo.next · March 25, 2025, 8:29pm

I’m not related to openai, I speak from my user experience.

If you need more exact proof, there is no need to believe me.

You can easily do it yourself by using playground to convert a text of about 1 minute (roughly a 1000 chars) and check the costs dashboard.

It will tell you how many input text tokens and how many output audio tokens were used, compare to the audio file generated and you can reach your own conclusions.

ps: In the costs dashboard there is an export CSV file that gives you the exact fraction of costs.

aprendendo.next · March 25, 2025, 8:36pm

I did, but didn’t notice too much difference. whisper-1 model is already pretty good, but it is helpful having different models when the transcription goes wrong and sometimes one of the models work better.

Particularly short audios tend to go bad if the quality is not very good, then having alternatives is always nice.

ps: I forgot to add, if you need transcription on less popular languages it has improved a lot more according to the docs. Like hindi, but since I don’t speak any particular one I can’t say for sure.

oded · March 25, 2025, 8:49pm

Thanks for this!! Will go checkout the playground

Topic		Replies	Views
TTS API service usability API tts	17	7201	December 16, 2023
WebRTC gpt-4o-audio cost per minute of conversation? API gpt-4o-audio-preview	2	1379	March 11, 2025
Confusion Between Per-Minute Audio Pricing vs. Token-Based Audio Pricing API realtime	3	7591	December 30, 2024
Any plans for releasing an API for TTS? API api , tts	28	6007	November 9, 2023
Audio-transcribe or Whisper API pricing query API whisper	4	2155	December 17, 2023

New TTS API pricing and gotchas

Related topics