New Realtime API voices and cache pricing

jeffsharris · October 30, 2024, 5:09pm

Today we’re adding five new voices that you can use with the Realtime API: Ash, Ballad, Coral, Sage, and Verse. These voices are much more expressive and you can better tune their emotions, accents, and tones to your needs. Take a listen.

We’re also making our first cost reduction for Realtime with, prompt caching:

Text input that hits the cache costs 50% less.
Audio input that hits the cache costs 80% less.

Basically, your typical 15 minute conversation costs about 30% less now than when we first launched the Realtime API. You can try out the new voices in the Playground right now (and be sure to use system instructions to direct their tones). Get started with the docs today—we’re looking forward to seeing and hearing what you build!

ivan-luchkin-u · October 31, 2024, 12:48pm

Amazing news! However, there is currently no (documented) way to calculate the total session price because the usage field in response.done event seems to either not differentiate between cached audio and cached text tokens, or only include the cached text tokens according to the events API reference

ivan-luchkin-u · October 31, 2024, 12:54pm

Just tested it through the playground. It appears that the proper cached token details were indeed added to the usage of response.done, but cached tokens are still not reported in the total token usage popup in the playground and the response.done documentation is definitely out of date

Fusseldieb · October 31, 2024, 6:05pm

That’s amazing news. Can’t wait to see what I can build with it

parkforce · October 31, 2024, 6:35pm

Just been to the playground and asked a few questions and burnt through nearly $3.

Is this correct ???

it stated no cost

00:54

### assistant

Hi there! How can I help you today?

00:58

### user

what does it cost per minute

00:58

### assistant

Could you clarify what you're asking about the cost per minute for? Is it a phone plan, a service, or something else?

01:07

### user

using this

01:08

### assistant

Oh, interacting with me doesn't cost anything! Feel free to ask me anything you'd like.

mitchell_d00 · October 31, 2024, 6:37pm

I hope not

That’s so pricey, it reminds me of pay per min internet

karanjain · November 3, 2024, 5:42pm

Would these new voices be made available through the TTS API as well or just via Realtime API?

rcasburn · November 4, 2024, 2:26pm

You can never trust the LLM to know anything about itself. Look at the pricing page for the actual details.

j.wischnat · November 4, 2024, 2:33pm

This is definitely not correct.
Check your usage.

Also, here is the pricing page.

j.wischnat · November 4, 2024, 2:37pm

Check this out!

Oct 30
Featuregpt-4o-realtime-previewgpt-4o-audio-previewv1/chat/completions

Added five new voice types in the Realtime API and Chat Completions API.

bertha.kgokong · November 6, 2024, 3:21pm

The RealTime API is amazing and it works so well, I love it. The pricing though makes it impractical to use for an actual application - even with a 30% discount. I tested an app I built for just a conversation and the bill was close to $80 for two days of testing - just a couple of minutes per day.
So the app is great, but there is no plausible way to monetize it - I just dont see users paying like $100 for a few minutes talk with an AI, when we can barely get them to subscribe to $20 a month.

jeffsharris · November 6, 2024, 6:35pm

Hey @bertha.kgokong - very glad you like it. And lower costs is going to be a continued priority

If you have any session IDs to share, I’d love to debug. It’s definitely not expected that your costs would be anywhere close to $10/min

jochenschultz · November 11, 2024, 5:10pm

Maybe when adding some data or functionality to it it becomes a service of higher value. E.g. as a frontend of a working autocoder or virtual project manager where the customer explains a product and it creates something like software or a project plan based on the conversation…

jochenschultz · November 11, 2024, 5:15pm

But 1$ per minute is the real price? For “hey how are you” and “fine, how are you” 1$?

jeffsharris · November 11, 2024, 11:59pm

@jochenschultz that’s definitely unexpected. When I have that conversation (with a normal length system prompt) I use:
10 audio input
53 text input
76 audio output
26 text output

which is $0.017 (a bit less than 2 cents)

can you post the token breakdowns you’re seeing?

if you click on the Logs panel in the Realtime playground you can see the tokens

jochenschultz · November 12, 2024, 2:33am

Oh, I just had a short test in the playground and it was less then 5 minutes but in usage it showed ~5$ … which is now gone. I just did some short stuff like hello, how are you… like no more than 20 words in total and maybe generated… however now my history is gone and it doesn’t show that in my usage anymore.

Will give it another try then.

* just checked again… that was last month - and yes 4.74$ for not even 5 minutes. I am pretty sure about that - even when I don’t see the logs on that.

jochenschultz · November 12, 2024, 2:59am

user
You

00:02
assistant
Hello! How can I help you today?

00:02
user
[inaudible]

00:03
assistant
¿En qué puedo ayudarte?

00:04
user
[inaudible]

00:05
assistant
Parece que me estás poniendo a prueba. Estoy aquí para ayudarte, así que dime, ¿qué necesitas?

00:20
user
[inaudible]

00:22
assistant
¡Claro! Pregúntame lo que quieras. ¿Sobre qué tema necesitas información?

00:36
assistant
Estoy aquí para responder tus preguntas. ¿Qué te gustaría saber?

So I said hello (maybe mumbled a bit…) - and then it just started in Spanish and then I mumbled hello again…

I have no idea why it starts talking in Spanish… Because “inaudible”?

And now in another session after I cleared it it went to Korean (note that I did not say “that’s very interesting” - I said it in German “interessant”.

00:00

### user

[inaudible]

00:01

### assistant

안녕하세요! 무엇을 도와

00:01

### user

Okay.

00:02

### assistant

말씀하세요! 궁금한 게 있으신가요?

00:08

### user

That's very interesting.

00:09

### assistant

흥미롭게 생각해주시니 기쁘네요! 혹시 더 알고 싶은 것이 있나요?

And in another session when it went to Spanish and I asked it to switch to Spanish it recognized my “Seria posible hablar en aleman?” as "
Serait-il possible de parler en allemand?" - which I admit is pretty close - but it answered in french then…

Very strange… maybe it has no access to my language settings and therefor the system prompt get’s confused…

Anyways 3 or was it 4? small tests ~3-4 minutes came out to approximately 0.67 $.

Which is still hefty - but not the expected 3-4$.

I guess I got to check my microphone before I complain more on that though.

*yeah was my headset not being the active microphone and my build in microphone of my laptop is pretty far away. That with the mumbeling (because it is 4am and I am not alone in the house) makes a lot more sense.

After switching to headset it worked just fine.

00:02
user
Okay, jetzt müsstest du mich aber hören, oder? Richtig?

00:06
assistant
Ja, ich höre dich laut und deutlich! Alles funktioniert. Wie kann ich dir helfen?

“Ok, now you should hear me, right?” - “yes I hear you loud and clear! everything works. How can I help you?” in German – 0.02 $.

So I guess it has to do with, when the application can’t really recognize what I want in terms of audio recognition…

_j · November 12, 2024, 5:21am

As an API developer, can you afford to have users not check their microphone, when static or silence or echo or loopback will trigger the server VAD to make a paid response every few seconds, building a “chat” that makes each more expensive?

This endpoint takes more intelligent discrimination of what audio you send, with manual triggering.

Then, manual discarding of a socket before a chat can grow too long, hitting OpenAI’s parameter for limiting maximum context window loading to the model capabilities, and not one you can supply.

Then, does deconvolution of tokens to audio really add 10x as much computational cost as generating the AI tokens themselves…or just add that much to your bill.

kishal · November 18, 2024, 4:38pm

When I am trying to change the voice I am geeting this:

TS2322: Type ‘“ash”’ is not assignable to type ‘“alloy” | “shimmer” | “echo” | undefined’.

I cannot use the other voices, I want to use a male voice, which i am unable.

Topic		Replies	Views
Help me understand the realtime usage block API realtime	4	654	December 18, 2024
Realtime API pricing is wrong, will overcharge API realtime	36	3749	January 15, 2025
Estimate the cost for 1 min usage of Real-time API API gpt-4 , api , realtime , api-realtime	5	4512	January 9, 2025
Confusion Between Per-Minute Audio Pricing vs. Token-Based Audio Pricing API realtime	3	5038	December 30, 2024
Help me understand the true cost of the RealTime API API api , realtime	2	1155	March 26, 2025

New Realtime API voices and cache pricing

Related topics