According to the documentation, Whisper is $0.006/minute while TTS is $15.00 for 1M characters. It would be nice if there was an estimated cost per minute like the new 4o-mini audio models have so the current users of Whisper have a better idea of potential cost savings.
Also, aside from the prompt, what else can contribute toward the input token cost of 4o-transcribe and 4o-mini-transcribe?
I am a bit confused too. It seems to use a concept of audio tokens, not directly relatable into “minutes”. I couldn’t find any further information though, but in the API output it will tell you how many tokens were consumed.
What I can say is that summing up it all it is very low cost, you can check on your usage dashboard.
Basically, for TTS you have the prompt for instructions, which follow the usual text token measure, plus the audio tokens for the generated audio.