Feedback on OpenAI's Speech Feature in ChatGPT App

I wanted to share some feedback on the new speech interaction feature in the ChatGPT app, which I’ve been experimenting with.

First off, the level of naturalness in the speech synthesis is impressive. The occasional “uhhs” and subtle stutters add a remarkably human touch to the interactions, enhancing the overall experience.

However, I noticed a few areas that could use some refinement. When interacting in Portuguese, French, and Italian, the speech still carries a slight American accent. While it’s amusing, it detracts from the authenticity you’re likely aiming for in multilingual support.

Moreover, the speech recognition seems less adept with brief, non-English input. Even when pronouncing words with phonemes absent in English, it often defaults to English interpretations. This contrasts with the app’s text-to-speech feature, which handles language detection quite adeptly.

Additionally, there’s a minor hiccup when users pause in thought—the system tends to respond prematurely. Extending the wait time before voice recognition concludes and responds would allow for a more natural pace in conversation, reflecting real-life interactions where people may need a moment to gather their thoughts.

Lastly, the frequent prompts asking if there’s “something else you want to talk about” can be repetitive. While I understand the intent behind this prompt to keep the conversation flowing, an occasional break from this pattern might be less intrusive and more comfortable for the user.

Thank you for considering my suggestions. The feature is a great step forward, and with a few tweaks, it could be even better.

(Written with the help of GPT4)

1 Like
  1. Ability to begin, pause and end audio conversation hands free / purely via natural language
  2. Ability to modify spoken avatar’s settings/configurations (eg humour settings)
  3. Ability to see the transcribed text in real time, without having to refresh the brower/exit out of talk mode on mobile
  4. Stops listening automatically after x settings (configurable)
  5. Available across any/all devices/platforms - eg available via the web app
1 Like