Advanced Voice Mode is out and I’m super curious about what voice other folks chose and why? (And thoughts on voice mode in general)
They released a whole slew of new slew of accents, and the response time is incredibly immediate. They’ve certainly got the tête-à-tête down, and that’s not nothin’. For example, you can interrupt them mid-sentence. I haven’t had a chance to test multiple speakers, and it doesn’t have a vision capacity yet—but that’s okay with me. (One baby step at a time.)
I like Juniper still, actually: open and an upbeat, cultured, American accent.
I chose that voice because I like the idea of having a bright coworker to work with that I would not—to be perfectly blunt—become emotionally attached to.
The British and Aussie accents are cool—but in America, every AI voice ever has a British or Aussie accent. ><
As a power user of GPT’s voice features, particularly for brainstorming, I’ve tested the new advanced voice mode as much as possible within the short time limit on day one. Here are my initial thoughts:
Pros:
Natural conversation flow with intuitive interruption in advanced mode
I like to move through conversations quickly, so the ability to speed things up and interrupt redundant GPT responses is crucial. Standard mode’s loss of manual interruption significantly impacts its usefulness for brainstorming.
Verdict: Currently more gimmicky than practical. Crucial improvements needed:
Context-adding options (text, documents)
Manual interruption via screen touch in both advanced and standard voice modes
Removal of conversation length restrictions
While limited now, I can see the potential. Once these issues are addressed, particularly the ability to manually interrupt GPT mid-speech in both modes, this feature will be a game-changer for productive AI interactions.
Honestly, I wish we were able to see the text on the screen, rather than the animation. This is one of the biggest reasons i usually do not use voice mode in that way. Instead, I long press and select “read out loud.” I prefer it this way even though it takes more time.
I just out found today that Sky was inspired by Her. Regardless, I didn’t think the voice sounded warm. So I never used that one lol. It makes me want to watch Her to see what that was all about. Anyway, I have grown accustomed to Ember. That voice is my all time favorite until now? I recently saw that there is:
Arbor (Male voice) (UK or Aussie accent?)
Sol (Female voice) (American)
Spruce (Male voice (African American)
Vale (Female voice) (Irish accent)
Maple (Female voice) (American)
It is too early to tell, but I look forward to trying the other voices and perhaps my top 3 will change in time:
Ember
Maple
Sol
I apologize in advance for any errors in accent ID.
Yeah, I agree. I have the voice feature running on my phone and have the PC window open at the same time refreshing the window to see the text as I go to try and get around this issue.
Correct @merefield Sky was removed awhile ago now because actor Scarlett J. felt it resembled her voice too closely from the movie, “Her.” OpenAI respected her request to remove it. Therefore, the Sky voice option is just a thing in the past. I wonder if Maple and Sol are close resemblances? Idk, but they’re all really cool imo. Hopefully that helps clear the air for any and all who come across this.
I asked for a British voice and we got two! Arbor is a bit too ‘Estuary English’ for me, but I like Vale. She drops a few H’s here and there but overall she sounds like a chatty, normal Englishwoman.
Now we need a Stephen Fry or a Maggie Smith
Hahahahaha. I’m not sure Maggie Smith would appreciate being resurrected like that (RIP Guinella)… But a voice with stern patrician disapproval would be something.
I was wondering if there are any useful biases between voices?
They seem to use different language…
Hey
Hi there
Hey what’s up
It would be nice to know more what the profiles are on these. I guess some are faster to converse with than others, maybe better on different topics for nuanced reasons?
Do the voices have deeper profile info anywhere than ‘Open and upbeat’ or ‘Animated and Earnest’?
I take it the voices chosen have a baring on the returned output result text or do they just intro different on the selection screen?
I don’t know. Probably, given your observation. As for other topics and deeper profiles, there might not be anything other than “your voice is animated and earnest,” in the prompt, as simple as that.
Though as I write that, it seems unlikely that there isn’t a lot of research behind each voice.
Having a feature to reduce speed of response, or knowing if certain voices were slower to respond, or spoke with a slower cadence, would also be super helpful.
In demonstrating this to others, the immediacy can be a little stressful when one wants to pause and compose one’s thoughts.