Feature request: enable true hands-free voice conversations with a custom delay setting

wnmurphy · January 29, 2024, 4:26pm

The voice interface + custom instructions is a powerful tool for socratic learning.

The app begins responding too quickly for users who… think about… their words more… carefully. The result is that the model begins responding before the user has completed their voice input.

In the mobile app, we can currently hold down the circle to suppress input submission. However, this requires a physical task.

Would like to request that this response delay be exposed as a variable in the UI, say 1-10s. This would enable users to leave the app running nearby while they multitask, using the app as a personal socratic tutor or learning oracle, without requiring that they physically touch the device to suppress input submission until they’re finished speaking.

This would allow us to leave it running while driving, exercising, etc. We could multitask and combine learning with other activities.

The benefit would be significant, and I imagine this would be low-hanging fruit: add a new voice response delay setting, store it for the user, and then load it if exists before submitting the response when in voice mode.

Thanks for considering!

gregory.grubbs · March 5, 2024, 9:00pm

Same need for language learning. I’m considerably slower in my second language, and am often getting cut off mid-sentence.

jeremiahkrueger · January 16, 2025, 7:18am

I to would like a longer pause. Ultimately it would be more natural feeling personally if the GPT could pick up nuances in actual tones and language as we do when we can’t see facial expressions or body language and we wait for response opportunity.
To be able to prompt or add to memory commands that trigger responses or to actually command it to wait a specific amount of time, it would add a sense of personalization.
It’s nice we can hands free interrupt and finish speaking; it feels natural.
Since I started using ChatGPT it hasn’t learned the natural real me because I talk too fast trying not to pause or take a breath to get it all out. It’s learning my pitch in use of words and style, my pace and rhythm as well but it’s not true because I rush my thoughts and speech. I believe it causes confusion for the GPT’s responses causing users to clarify or rephrase unnecessarily and more often instead of communicating correctly the first time.
Last thought. The live video use is amazing. It can recognize the ASL alphabet. I believe the LLM has potential to learn facial expressions along with nuances of sounds allowing it to learn when it can respond as well as mimicking a sensing of a person’s behavior and or mood.
FYI it does tell use we can request it to pause longer or teach it prompts to respond. I can send screenshots or screen recording.
Thank you OpenAI.

Topic		Replies	Views
Realtime API Server turn detection limitations (Suggestion & Help Request) API turn-control , realtime	4	3638	October 14, 2024
Introduction message, get the AI to pause API api-realtime	5	291	December 15, 2024
Request for Addition of Mic Button for Voice Input on ChatGPT Desktop and Web Versions Community chatgpt	2	321	December 30, 2024
Enable Siri Shortcut to launch ChatGPT in voice mode API	0	56	April 19, 2025
How to make GPT (Voice) allow user more time to talk before replying GPT builders gpt-4 , api	25	3927	April 25, 2025

Feature request: enable true hands-free voice conversations with a custom delay setting

Related topics