Agreed, played around with integrating with the Realtime API in my app during the weekend and was shocked to find it set me back $10. Really impressive capabilities but the cost makes this unusable at the moment.
A couple of things Iâm doing in my app to help keep cost downâŚ
- I switched to using a âpush to talkâ button which not only helps to make costs more predictable, it solves a sensitivity issues with the current vad stuff.
- I added a text input panel that bypasses the RT API all together. I donât currently have the conversation history from the text and voice chats synched but you should be able to do that.
Hereâs where Iâm at with that UI wise:
I donât really have a need for a chat feed in my current UX but I do have a âtoastâ mechanism so when I go the text-only route I can popup the answer as a piece of toast.
BTW⌠I coded none of this⌠This was all Claude 3.5 Sonnet. o1-mini is a decent coder but sonnet is generally as good and is more consistent. itâs also faster and cheaper.
Agreed, I integrated into Unity to make a realtime avatar. i used 4o to build and translate the API instructions from python to c#, attached to a animated character with a lipsync software, and a 3rd party websocket for Unity to access the API. After a day of testing im 40 bucks down. it works great but i wonât use it till its cheaper!!
I imagine the prompt caching mechanism isnât quite where they want it to be - I hope that once they iron out the kinks (and if thatâs the actual problem) - that the actual operational cost of RT could come down dramatically.
Letâs hope itâs just an unfortunate culmination of crunch time and having to release something resulting in a half baked delivery.
At least we get a preview on what the API and capability is going to look like, so youâll have a bit of a head-start once prices become reasonable.
I see Iâm one of many but I disagree with a lot of the prices named here. I just used it for what the usage panel tells me was 75 seconds and was billed $5.28 including only 11k input and 2k output tokens. I calculated that it overcharged me AT LEAST 2.75x what the pricing page suggests. The price per hour for me would be $282.
Hello vdhavala, I would like to hear more details about the details you used to get the alternative option to real-time API. Can you please guide me ?
Thank you
GBRL,
Sorry, our effort being a commercial one, I cannot discuss in this forum.
loved the experience but can not agree with prices. Canât be budget.
I did some testing with an agent we currently use in production with text only. The costs are currently way to high to be able to use it in production. Looking at salaries in The Netherlands, the Realtime API is 2x the cost of a human in our test!
Just wanted to add my own experience, i think there must be an issue somewhere , either in the usage graph, billing system, idk.
I just talked a couple minutes with Real Time API
- 11 requests
- Input Tokens 36k ( I Did not talk 36k tokens, probably around 2k max)
- Output tokens 1.5k
- Cost 14.80$
So now letâs imagine I was able to speak those 36k tokens, ( which i did not ), using open AI pricing, should have cost 3.6$, output of 1.5k should have cost 0.3$ ,
so even in that scenario, should have cost 4$ not 14.8$.
That it does make no sense.
Yeah, that is why I am keeping a distance from Realtime APIs. Hope they cut prices soon.
I work for an insurance company and see many use cases for this but the price is way to high for this to ever go into production.
- Has there been any feedback from OpenAI on this?
- Is the pricing currently just bugged?
I genuinely want to build apps using the realtime API but just prototyping it cost me $40+ already. Literally just from trying different prompts and testing my code.
Iâm also looking for some feedback from OpenAI on this. It seems like many people have essential questions that havenât been answered.
There are no bugs with the RealTime API pricing. People just arenât taking into consideration the fundamental process of conversation management in all LLMs, or how the interruption feature works.
Historically OpenAI has released a product as a preview and then later manage to drop the price while increasing the quality.
The price should not be a factor in why you arenât prototyping this feature.
As of right now, I would consider for anyone using this to have an intelligent switch mechanism between RealTime and the standard STT->TTS paradigm.
Hi @johan.holmberg You might find this thread helpful:
Realtime API Pricing: VAD and Token Accumulation - A KILLER
@anon10827405, consider the following:
The STT>TTS paradigm (the old way), is akin to dialup internet in the context of providing a seamless, realtime speech experience. No questions about it.
The RealTime concept is a paradigm shift (the new way).
For a speech application to deliver an experience worthy of someone paying for it, demands this shift, especially when the world has gotten a taste of what AI powered speech can really do (e.g 11Labs, OpenAI Advanced Voice etc.) and for a commercially viable product, you canât possibly be wanting to settle for the âold wayâ - I understand, its needed as a backup nonetheless, but we are meant to be looking forward to things not backwards.
As for the pricing, if you look at this thread, youâll note that OpenAI recognizes the way the API is currently priced, with the primary issue being how conversation tokens (audio in/out) are carried forward from turn to turn. This results in massive inflation of token count and a disregard for the caching possibilities.
There are workarounds. There are positives in trying to work around this issue and then thereâs the ongoing conversation from OpenAI talking about rolling out caching changes in the next couple of weeks.
Unlness youâre funded for R&D, I think for a lot of devs on this forum, prototyping cost has to make sense. Right now, its tough when each 3-4 min call would set you back anywhere from $6-$16 depending on the structure of the call.
Considering your suggestion of an intelligent switch, it would be great to learn more about intelligently handling the latency resulting from a switch like that.
The price should not be a factor in why you arenât prototyping this feature.
When a day of prototyping sets you back $10, itâs a factor. Not everyone has tons of money to sink. Itâs great if they can drop the price later, but right now itâs not usable.
This is basic fundamentals for ALL Large Language Models.
They are stateless and need to know the context to continue the conversation. This shouldnât be a shock to anyone.
Sure, if you are looking for more tool usage and latency isnât the factor then you can use the standard method. There a still a lot of knicks in the RealTime API that IMO do not make it sufficient yet for production.
When OpenAI released Davinci the input was $0.02/1k tokens. For reference: o1-preview is priced at $0.015 / 1K input tokens.
People spent a lot of time and effort performing numerous optimizations to keep costs down.
I donât know what you are trying to ask for. You want them to eat more of the costs so that more people can prototype their technology?
If you canât justify the costs to setup, or canât setup a testing environment then the option is to wait. Historically, OpenAI drops the prices in a short amount of time, and also changes/adds a lot of more functionality, sometimes requiring people to completely update their own paradigms.
In time the price will be down, and the service will be more stable.
I understand, but this is just how life is.
You canât seriously drive a car and then get mad about the gas price. I mean, you can, it just wonât do anything. If you see incredible value in these services then find a way to afford the gas.
Thatâs the reason why âcached_tokensâ exist, which, OpenAI isnât using in the API calls, which is exactly what they have said, they are fixing now.
If caching is used effectively, carryover cost will not be an issue, which is what the current problem is.