Realtime API pricing is wrong, will overcharge

I can’t see what your usage page shows, so I can’t speak to that.

What I can tell you it’s that you’re charged for all tokens generated, whether you hear them or not.

So, if you ask the model for a long story, it might finish doing the computations after 10-seconds for audio which will take 3-minutes to play back. If you interrupt the model after 15-seconds, you still need to pay for 3-minutes worth of generation.

Now, your voice input is tokenized at 10 tokens/second of audio and is also tokenized as text at about 1.3x the token count of o200k_base. Output audio is tokenized at 20 tokens/second of audio and has the same text tokenizer at about 1.3 the normal rate. System messages are just tokenized as text.

So, when you’re doing a lot of interruptions, unless you’re actively truncating and culling all the audio tokens which you’re not really using, those will pile up quickly.

If you’re doing 10 rapid exchanges, whatever the output token count of your first response will be included 9-times as audio input tokens, the second will be included 8-times, etc. So, if you terminate a 2,000 audio token response early and don’t remove it, that will add up very quickly if you keep doing rapid-fire exchanges.

2 Likes