Hi, there, why not support more voices or support synthesized voices? I’m making a children’s conversational bot and I really need a children’s character’s voices.
u may try to connnect it with elevenlabs.
Thanks for your advice. But i prefer openai’s advanced modeling capabilities, including function calls, etc.
Elevenlabs has function calls. It pretty much has everything that OpenAI has but even more developer friendly. It also offers new things that OpenAI does not have.
Not to advertise, this is just a simple fact.
The only slight difference is that the Realtime API is true audio to audio.
Elevenlabs uses Speech-to-Text - Text-to-Speech.
In 90% of cases, you won’t need the true audio to audio functionality (i.e. steerable voices etc.)
In Elevenlabs, just select a voice that already sounds like a “children story voice”. (Or generate your own.)
I see, its ASR+LLM+TTS. But I’m in the 10% of scenarios where I not only need to customize voice, but also need to emulate emotions, both in terms of recognizing them and outputting them, such as “Speak slower,” “Keep it down,” “Be stern! “ and so on.
I see, then the Realtime API might be better fit for your case.
I wish they would add more voices as well or provide a way to make our own, we’re talking about OpenAI though. Sadly they lean less and less towards a “developer first” experience. Let’s just hope they add something like this in the future.