I wanted to share some feedback on the new speech interaction feature in the ChatGPT app, which I’ve been experimenting with.
First off, the level of naturalness in the speech synthesis is impressive. The occasional “uhhs” and subtle stutters add a remarkably human touch to the interactions, enhancing the overall experience.
However, I noticed a few areas that could use some refinement. When interacting in Portuguese, French, and Italian, the speech still carries a slight American accent. While it’s amusing, it detracts from the authenticity you’re likely aiming for in multilingual support.
Moreover, the speech recognition seems less adept with brief, non-English input. Even when pronouncing words with phonemes absent in English, it often defaults to English interpretations. This contrasts with the app’s text-to-speech feature, which handles language detection quite adeptly.
Additionally, there’s a minor hiccup when users pause in thought—the system tends to respond prematurely. Extending the wait time before voice recognition concludes and responds would allow for a more natural pace in conversation, reflecting real-life interactions where people may need a moment to gather their thoughts.
Lastly, the frequent prompts asking if there’s “something else you want to talk about” can be repetitive. While I understand the intent behind this prompt to keep the conversation flowing, an occasional break from this pattern might be less intrusive and more comfortable for the user.
Thank you for considering my suggestions. The feature is a great step forward, and with a few tweaks, it could be even better.
(Written with the help of GPT4)