Per 1 million token, mini tts costs around 12 dollar. This is what it says. But when I converted a text that is 20000 tokens to voice, it cost me more than 1 dollar. Normally, 20000 tokens should only be around 0.3 dollars, right? Do you think it is because the slowness of the voice increases the minute duration? I created 18 blocks of audio and they are all 5 minutes long. 90 minutes in total but only 20 thousand tokens. I beg your help. Thank you.
The pricing is a composite of text+audio tokens, which gives an average of 0.015 per minute.
- https://platform.openai.com/docs/models/gpt-4o-mini-tts
- https://platform.openai.com/docs/pricing#transcription-and-speech-generation
Yes, if you instruct it to speak slower it consumes more output.
1 Like