Feature request: enable true hands-free voice conversations with a custom delay setting

The voice interface + custom instructions is a powerful tool for socratic learning.

The app begins responding too quickly for users who… think about… their words more… carefully. The result is that the model begins responding before the user has completed their voice input.

In the mobile app, we can currently hold down the circle to suppress input submission. However, this requires a physical task.

Would like to request that this response delay be exposed as a variable in the UI, say 1-10s. This would enable users to leave the app running nearby while they multitask, using the app as a personal socratic tutor or learning oracle, without requiring that they physically touch the device to suppress input submission until they’re finished speaking.

This would allow us to leave it running while driving, exercising, etc. We could multitask and combine learning with other activities.

The benefit would be significant, and I imagine this would be low-hanging fruit: add a new voice response delay setting, store it for the user, and then load it if exists before submitting the response when in voice mode.

Thanks for considering!

4 Likes

Same need for language learning. I’m considerably slower in my second language, and am often getting cut off mid-sentence.

1 Like

I to would like a longer pause. Ultimately it would be more natural feeling personally if the GPT could pick up nuances in actual tones and language as we do when we can’t see facial expressions or body language and we wait for response opportunity.
To be able to prompt or add to memory commands that trigger responses or to actually command it to wait a specific amount of time, it would add a sense of personalization.
It’s nice we can hands free interrupt and finish speaking; it feels natural.
Since I started using ChatGPT it hasn’t learned the natural real me because I talk too fast trying not to pause or take a breath to get it all out. It’s learning my pitch in use of words and style, my pace and rhythm as well but it’s not true because I rush my thoughts and speech. I believe it causes confusion for the GPT’s responses causing users to clarify or rephrase unnecessarily and more often instead of communicating correctly the first time.
Last thought. The live video use is amazing. It can recognize the ASL alphabet. I believe the LLM has potential to learn facial expressions along with nuances of sounds allowing it to learn when it can respond as well as mimicking a sensing of a person’s behavior and or mood.
FYI it does tell use we can request it to pause longer or teach it prompts to respond. I can send screenshots or screen recording.
Thank you OpenAI.