Realtime API extremely expensive

@anon10827405, consider the following:

The STT>TTS paradigm (the old way), is akin to dialup internet in the context of providing a seamless, realtime speech experience. No questions about it.
The RealTime concept is a paradigm shift (the new way).
For a speech application to deliver an experience worthy of someone paying for it, demands this shift, especially when the world has gotten a taste of what AI powered speech can really do (e.g 11Labs, OpenAI Advanced Voice etc.) and for a commercially viable product, you can’t possibly be wanting to settle for the “old way” - I understand, its needed as a backup nonetheless, but we are meant to be looking forward to things not backwards.

As for the pricing, if you look at this thread, you’ll note that OpenAI recognizes the way the API is currently priced, with the primary issue being how conversation tokens (audio in/out) are carried forward from turn to turn. This results in massive inflation of token count and a disregard for the caching possibilities.

There are workarounds. There are positives in trying to work around this issue and then there’s the ongoing conversation from OpenAI talking about rolling out caching changes in the next couple of weeks.

Unlness you’re funded for R&D, I think for a lot of devs on this forum, prototyping cost has to make sense. Right now, its tough when each 3-4 min call would set you back anywhere from $6-$16 depending on the structure of the call.

Considering your suggestion of an intelligent switch, it would be great to learn more about intelligently handling the latency resulting from a switch like that.

1 Like