Voice Assistant with Real-Time Screen Access and Coding Support for Desktop

Hi OpenAI Team,

I really enjoy using ChatGPT on my phone, especially how personal and customizable the conversations can feel. The ability to adjust the tone, directness, and other conversational elements makes it feel closer to human interaction. However, I noticed that this feature hasn’t been fully implemented on the desktop yet.

I have a suggestion that could elevate the desktop experience:

Introducing a voice assistant feature on the desktop that can access the user’s screen in real time.

When users are coding or performing specific tasks, the voice assistant can observe the screen, detect errors as they happen, and offer real-time feedback and assistance without requiring the user to explain the issue. This would reduce the chance of miscommunication or confusion when explaining complex problems. Additionally, the assistant could even go a step further by taking control of the computer (with permission) to make corrections and implement fixes directly.

I believe this feature would significantly improve workflow, especially for developers or anyone working with technical tasks, as it would allow ChatGPT to be much more hands-on and responsive to real-time issues.

Thanks for considering this idea! I think it would make ChatGPT even more useful and efficient, particularly for desktop users.

2 Likes

Hi @haarisn and welcome to the community!

You can actually use it as you imply in your post, by clicking on the “+” icon in the lower-left of the chat bar, and selecting “Take Screenshot”. You select a specific app or the entire screen. I have used it like that quite successfully when trying to debug some code. Once you take the screenshot you can type things in or use the voice mode.

Note: this is when using the Desktop app.

When you are in advanced voice mode and you take a screenshot, it’s opening up a new chat and the voice assistant doesn’t have access to it.

I agree. While there are options to upload screenshots into a chat, and get feedback on what it views/interprets, this simply isn’t full integration. I have actually posted about this particular feature request and gone into fairly deep detail about implementation for OpenAI. If you want to check out the post (and hype it up to grab OpenAI’s attention!) check it out and let’s get the devs to do this!