Realtime API pricing is wrong, will overcharge

anon22939549 · October 8, 2024, 2:24am

I can’t see what your usage page shows, so I can’t speak to that.

What I can tell you it’s that you’re charged for all tokens generated, whether you hear them or not.

So, if you ask the model for a long story, it might finish doing the computations after 10-seconds for audio which will take 3-minutes to play back. If you interrupt the model after 15-seconds, you still need to pay for 3-minutes worth of generation.

Now, your voice input is tokenized at 10 tokens/second of audio and is also tokenized as text at about 1.3x the token count of o200k_base. Output audio is tokenized at 20 tokens/second of audio and has the same text tokenizer at about 1.3 the normal rate. System messages are just tokenized as text.

So, when you’re doing a lot of interruptions, unless you’re actively truncating and culling all the audio tokens which you’re not really using, those will pile up quickly.

If you’re doing 10 rapid exchanges, whatever the output token count of your first response will be included 9-times as audio input tokens, the second will be included 8-times, etc. So, if you terminate a 2,000 audio token response early and don’t remove it, that will add up very quickly if you keep doing rapid-fire exchanges.

Topic		Replies	Views
New Realtime API voices and cache pricing Announcements realtime , prompt-caching	26	7037	November 27, 2024
Realtime API Pricing: VAD and Token Accumulation - A KILLER Community token , pricing , tokenization , realtime	21	3224	October 23, 2024
Help me understand the realtime usage block API realtime	4	441	December 18, 2024
Cached input audio_tokens is always 0 API audio , realtime	3	286	November 8, 2024
Realtime API extremely expensive Feedback realtime	66	6309	December 4, 2024

Realtime API pricing is wrong, will overcharge

Related topics