Realtime API extremely expensive

stevenic · October 4, 2024, 5:46am

lol… I totally broke it I set max_tokens to 30 and asked to count to 100. the transcript shows it cutting off after a few tokens but it actually read back the full response up to 60

And just because it’s funny… The cuttoff transcript is confusing it as to which language we’re speaking

Diet · October 4, 2024, 5:49am

now if you’re also just billed for those 30 tokens, then go grab that bag Public Bug Bounty: Open Ai - Bugcrowd

stevenic · October 4, 2024, 5:50am

I checked and they only billed me for the 30 tokens

stevenic · October 4, 2024, 6:08am

I tried to repo the audio bug I hit and haven’t gotten it to repo so likely just a state fluke somewhere. My gut says it was probably some sort of cache hit because of the fact I’ve been using the same basic prompt “count to 100 by ones” all night.

I was hoping that by lowering the max_tokens it would pressure the model to want to use less tokens but no such luck.

stevenic · October 4, 2024, 6:14am

unless the audio was cached in the browser state… It was clearly playing back and the generation said it was in a stop state. The logs didn’t show any buffering. I even clicked on other tabs to see if something else was playing. It’s definitely not reproducing now…

I was going to file a bug report but no real point if there’s not a reliable repro.

stevenic · October 4, 2024, 6:18am

I will say that this thing is pricey even for my taste and I spend a lot of money on OpenAI every month.

I had this clever idea that I was going to use ElevenLabs Voice Cloning feature to clone Alloy and then use Eleven Labs for playback of long text like reading a book or something. That’s when I saw that ElevenLabs is even more expensive…

Diet · October 4, 2024, 6:45am

Tortoise TTS is pretty good nowadays. Been eying it for a bit, people seem to be splitting by sentence for “realtime” generation.

chad942 · October 4, 2024, 7:53am

I tried it in the playground this morning. It’s quite disappointing. First, the transcription isn’t great. You need to have a headset and microphone for it to work correctly. And the price—WOW, it’s extremely expensive, especially for testing, and the AI is limited and doesn’t compare to Vocal Advanced. Has anyone tried it outside the playground? Is it possible to select GPT-4o-mini as output with Nova’s voice? Is there a way to reduce costs by mixing models like Deepgram, Claude for the LLM? Mixing STT, LLM, TTS, and Speech-to-Speech?

johnkears · October 4, 2024, 7:59pm

I too am very disappointed at the cost… I had a 10 minute chat and saw a $6 charge… Given that we are pushing a captured microphone, I am wondering if this charging for empty frames? This is way to expensive!!

Curious, is anyone running VAD and just pushing in the spoken audio as opposed to streaming the data constantly?

brandonminiwheats · October 4, 2024, 9:11pm

It does charge you for all audio streamed in, even silence. In The playground and in the demo github repo they shared you can do push to talk.

I will say even with push to talk this is still very expensive. I don’t see this as being feasible economically for a lot of companies out there. I am also curious why only three voices are offered and why none of those voices are the same as the advanced voice mode. The voices offered in my opinion are not as good as the ones in Advanced Voice Mode.

johnkears · October 4, 2024, 9:21pm

Then we should do our own VAD and only shoot in the captured speech.

NormanNormal · October 4, 2024, 9:25pm

You nailed it. This is clearly a very rough release by OpenAI. It technically and fundamentally works, but beyond that, it is extremely flawed.

anon10827405 · October 4, 2024, 9:47pm

You don’t get billed for silence.

You only get billed for tokens spent during the Speech Detected phase.

Diet · October 4, 2024, 9:49pm

Do note that it looks like [inaudible]/noise can/may still be able to take up a speech turn!

(if it’s billed as suspected)

supershaneski · October 4, 2024, 11:41pm

for noise level, you need to adjust the threshold. it will depend in your ambient noise.

brandonminiwheats · October 4, 2024, 11:48pm

Turns out I am not correct about this, it does not charge for silence

brandonminiwheats · October 4, 2024, 11:49pm

Thanks for correcting me on this, silence is not charged.

stevenic · October 5, 2024, 12:29am

Background noise can trigger a generation but you can set the sensitivity level.

stevenic · October 5, 2024, 12:41am

I tested a number of different scenarios to figure out exactly what we’re getting billed for:

matthewpottinger · October 6, 2024, 2:36pm

Splitting by sentence is good enough for most uses. This realtime api has really no benefit that makes the cost worth it vs that, unless you want to have some fun with the tone of voice and really don’t mind paying through the nose for that. Sentence by sentence normal tts is more than fast enough.

Topic		Replies	Views
Realtime API Pricing: VAD and Token Accumulation - A KILLER Community token , pricing , tokenization , realtime	21	3770	October 23, 2024
[Realtime API] Audio is randomly cutting off at the end Bugs realtime	81	5163	June 16, 2025
Realtime API pricing is wrong, will overcharge API realtime	36	3569	January 15, 2025
I don't understand the pricing for the realtime API API realtime	33	15074	October 8, 2024
New Realtime API voices and cache pricing Announcements realtime , prompt-caching	26	8496	November 27, 2024

Realtime API extremely expensive

Related topics