There’s a great amount of potential in ChatGPT Voice Mode in helping people learn languages. However hearing the chat box speak a word once is often insufficient and asking it to repeat itself is wasteful on both ends. I think it would be very helpful and even decrease server load if the Audio Mode allowed you to view the text spoken by ChatGPT in real-time rather than just some fancy graphics. To learn Chinese, it is very helpful to see the pinyin to understand the pronunciation of the word I will try to repeat.
I would love that feature, I feel I can follow better the answer reading.
Some times I open the computer and refresh the page to read what the app is speaking on the phone.
I am looking for the very same functionality. So ++ for this feature request.
Ditto. Talking and typing combines sight and sound. Speech is borderline useful as implemented.
Yes, this would be very useful!
Adding my voice for same feature
+1 I am really looking forward to having this feature. It would help us understand the length of the answer and quickly glance at the content to check if it’s aligned with our expectations or if we need to iterate on my prompt.
Yes i was just going to write this suggestion I have used gpt a ton to learn computer programs and sometimes it would be nice to not have to reprompt for a step and rather just look at the screen from time to time.
Just came here to say the same. Being able to talk and see the code I am working with at the same time is huge!
agreed!!! pretty sweet.
This is exactly why we need it. And the text is already there all that needs to happen is remove the fullscreen voice mode view
Please displaying text while speaking! This is very useful for language learning. For example, I’m practicing TOEFL speaking tests with ChatGPT voice, I’d like to check the script while listening to the response so that I don’t have to ask ChatGPT to explain some words in the text.
This is a much needed function and surprising that it isn’t an option. Please add text subtitles or at least let me choose to view the text from each response by clicking something that doesn’t end the voice chat
Agree! I’ve been telling my students how to use this tool to help them.
Absolutely need this functionality. When you close out the voice functionality, I sometimes lose the discussion. Still a tad buggy.
Yes, please consider this! I’m currently using GPT to help tailor my resume for various job applications, and being able to just speak my feedback, read the options GPT presents, and then copy/paste inline with the voice chat would probably cut quite a bit of time out of the process.
This is the most missing feature in all AI GUI apps. It is virtually impossible to talk with AI about programming-related things, as you always need to see code examples.
Voice mode isn’t built to discuss coding, even for AI models presenting themselves as AI for coding. I don’t understand how the authors of these GUIs designed it. I have seen this problem after 1 hour of using AI. They have been developing it for years and don’t see it? That’s weird, maybe some parallel universe kicked in.
It would be gorgeous to talk to AI fluently instead of writing questions to explore code APIs. It would be something totally different from what we are using now or what we know.
It will be a breakthrough when this will be available.
Also I want to ask if anybody knows about AI Gui that supports this feature.
I have discussed AI about this problem ~12 flip-flop chat. Below is the last answer.
Click to expand
Q: none of these support simultaneous chat and voice mode
ChatGPT povedal:
You’re right—most AI coding assistants focus on text, and very few support simultaneous voice and chat interactions in a standalone Windows app.
The closest options you might consider:
- ChatGPT (Mobile App) – Supports voice and text (but no Windows app yet).
- Microsoft Copilot in Windows 11 – Has voice input, but chat and voice are not fully simultaneous.
- Google Gemini (via Android App) – Can handle both, but not on Windows.
Unfortunately, a true AI assistant for coding that allows real-time voice and text on Windows doesn’t seem to exist yet. Would you like a workaround, such as using voice-to-text with a text-based assistant?
Tip - You can have GPT open in voice mode in one tab (or on your phone) and have the same chat open in another tab not in voice mode. If you refresh regularly you can see the text from your conversation come up.
Obviously not ideal but gets around the issue until they develop this feature.
+1, would be super useful for language learners.
PLEASE this would be so useful. so easy to implement too, I think the text is generated in real time anyway, it’s just covered up by the voice circle UI thing.