I’m new to the community and I have limited programming experience. I was excited to see the new update where we can create a GPT with limited coding knowledge. I have begun creating GPTS on ChatGPT 3.5 on the Pro Plan using the new update.
I want to make a GPT where instead of having the user experience be typing out and having text based conversations, I want to have the conversation be able to happen with the GPT responding via voice and the user can also respond by voice.
Is there a way I can turn the GPTs responses to voice and also have the user respond via their voice and have the GPT respond via voice audio? So in short, I want to make the experience feel more like a natural conversation with someone rather than typing everything out.
Is there an easy way to do this? If so, how would one go about this?
Yes. It’s built in. If you are using the app all you need to do is press the headphone icon and it starts a continuous voice conversation.
It’s a little glitchy though, but that’s TTS engines in general.
If you are talking about API then yes, also possible. You just need to facilitate it yourself.
Opinion: You may want to consider something like Eleven Labs. They are kicking ass with TTS.
They offer a lot more control. You can create and tune your own model (can even use your own voice, I have David Attenborough as my personal assistant, it’s great!), it’s cheaper, there’s a vast library of available models, and they have some good prompting elements like SSML (I think it’s called) (it’s for pronunciation) and pausing.
Yes, as @RonaldGRuckus said, if you download the app, you can just speak with your gpt. It’s an AMAZING experince!.
However, I’m making a React-Native app where my goal to reach the same. With extra feature > To generate images with dalle while its talking, so if I need a story + an img, I will hear the story and when its finish the audio, I will see the img as well (or even before).
At the current stage with GPT, it will speak the story and then start generating the img, which is little anoying.
I’ll do it open source so if you or anyone is interested, feel free to reach me aout
Hi. After reading the information in this thread I have a question. Can I organize and set up simultaneous voice translation during an online meeting? If it is possible how can I technically realize it?
dmisi98 I haven’t tried the voice chat feature yet, no interlocutor. I’m in the middle of the night, so I’ll try it in the morning. Do you have any positive experience using simultaneous interpretation with voice chat?
I’ve taken your recommendation and am blown away with Eleven Labs, but is there a way to merge it with Chat GPT? I love the conversational capabilities of Chat GPT but love the natural TTS of Eleven Labs. It would be great if I could merge the two somehow.
Please note: I have very limited - almost no - coding ability!