Can (custom) GPT speak and respond via voice?

Hi Everyone!

I’m new to the community and I have limited programming experience. I was excited to see the new update where we can create a GPT with limited coding knowledge. I have begun creating GPTS on ChatGPT 3.5 on the Pro Plan using the new update.

I want to make a GPT where instead of having the user experience be typing out and having text based conversations, I want to have the conversation be able to happen with the GPT responding via voice and the user can also respond by voice.

Is there a way I can turn the GPTs responses to voice and also have the user respond via their voice and have the GPT respond via voice audio? So in short, I want to make the experience feel more like a natural conversation with someone rather than typing everything out.

Is there an easy way to do this? If so, how would one go about this?


Yes. It’s built in. If you are using the app all you need to do is press the headphone icon and it starts a continuous voice conversation.

It’s a little glitchy though, but that’s TTS engines in general.

If you are talking about API then yes, also possible. You just need to facilitate it yourself.

Opinion: You may want to consider something like Eleven Labs. They are kicking ass with TTS.

They offer a lot more control. You can create and tune your own model (can even use your own voice, I have David Attenborough as my personal assistant, it’s great!), it’s cheaper, there’s a vast library of available models, and they have some good prompting elements like SSML (I think it’s called) (it’s for pronunciation) and pausing.

1 Like

Do they offer streaming option as well? So no need to wait the end of the audio.

Yes, as @RonaldGRuckus said, if you download the app, you can just speak with your gpt. It’s an AMAZING experince!.

However, I’m making a React-Native app where my goal to reach the same. With extra feature > To generate images with dalle while its talking, so if I need a story + an img, I will hear the story and when its finish the audio, I will see the img as well (or even before).
At the current stage with GPT, it will speak the story and then start generating the img, which is little anoying.

I’ll do it open source so if you or anyone is interested, feel free to reach me aout :slight_smile:

1 Like

Hi. After reading the information in this thread I have a question. Can I organize and set up simultaneous voice translation during an online meeting? If it is possible how can I technically realize it?

Yes they offer streaming

Did you tried in ChatGPT app the voice chat function?

dmisi98 I haven’t tried the voice chat feature yet, no interlocutor. I’m in the middle of the night, so I’ll try it in the morning. Do you have any positive experience using simultaneous interpretation with voice chat?

I really love it. Only issue, it can’t make parallel jobs like speaking and generating img in same time. But possibli you do not need that :slight_smile:

1 Like

I’m a bit confused, everyone here is saying its built into the app, but when I use the app, there is no headphones icon and seemingly no way to talk to GPT.

Is this only included if you pay for 4.0?

Hi. Yes, it is only available with the paid version. It simply doesn’t show as an option on the free one.

I’ve taken your recommendation and am blown away with Eleven Labs, but is there a way to merge it with Chat GPT? I love the conversational capabilities of Chat GPT but love the natural TTS of Eleven Labs. It would be great if I could merge the two somehow.

Please note: I have very limited - almost no - coding ability!

Unfortunately you can’t connect ElevenLabs with GPTs. It would only be something workable with Assistants (API)

1 Like

There is a chrome (and edge) extension called Talk-to-ChatGPT (its not an app, it only works in a browser) that allows you to chat with chatGPT and integrate elevenlabs voices.

You just need to enter your elevenlabs API. No coding required.