I use the voice interface for chatGPT on my phone almost exclusively. I rarely type into it.
One annoyance is the startup lag before it allows me to start speaking. Not sure why that is needed.
Occasionally I prefer to make my query quietly (ie, typing), depending on the environment. Also occasionally I prefer to type so I can consider and revise my query before submission
One final thought: Current communication models are more like telegraph than natural dialog. (ie, demarked ‘send’ / ‘reply’ ). We would need much more interactive conversation flow to reduce the need for thoughtful composition of interactions.
What I mean by 100% voice interface is when you don’t need to type not inside one specefic app, but on an OS level, where you have a voice assistant that knows all the apps and their data and instead of say 10-20 taps to change some setting you just exchange 1-3 voice messages with the assistant.
Voice will possibly never be able to accurately correctly nail punctuation, nuance, text formatting (CAPS, italics, bold, etc) ,or, for example, things like deliberate misuse of homophones for comedic affect (at least not without needing to explicitly spell it out).
It will certainly get to the point of being good enough for most people, most of the time, but for people who care about having complete control over their expression typing is is and will be much faster and more accurate for the foreseeable future.
There’s also the fact using your voice as a device input monopolizes that “channel,” you can’t easily, in the middle of “typing,” communicate with someone else using your voice—or even talk to yourself.
It’s just not a very practical human-machine interface.
When will voice commands be replaced by brain-to-cloud neuro-chips for knowledge and memory recall at the speed of firing synapses?
When can I connect to AI-Cloud++ at a cellular level, to become part of the hive?
When can we use quantum entanglement, for instantaneous communication with connected devices or interfaces, including humans, animals, vegetables and… minerals, making us “one with everything”?
And at what point, do humans become the weakness / contraint in the UI interface. Probably sooner rather than later.
Don’t you think that many of the things you mentioned may not be needed in the future? Expressing emotion and other aspects through formatting is actually a very reduced way to express emotion, voice is much better in this. And other things they just follow the technology we have now, they will probably evolve with the technology in the future.
There’s also the fact using your voice as a device input monopolizes that “channel,” you can’t easily, in the middle of “typing,” communicate with someone else using your voice—or even talk to yourself.
That is an interesting point, but to me it seems like a point in favor of the clip thinking many people are not happy about.
Same here. I have a task in my list to start writing, but developing apps and exploring the models the is so much more fun, so that the writing task never gets anywhere close to the top of the list
To say the VEEEEEEEEEEERY least. Almost 2 years a buzzing thought doesn’t leave my head: how did we suddenly get into the sci-fi we were all reading before?
I 100% agree, but… Something gets lost in translation.
As I type this, I’m contemplating how I would voice the difference between stressing a word with CAPS, italics, bold, ITALIC-CAPS, BOLD-CAPS, italic-bold, and ITALIC-BOLD-CAPS.
I’m sure someone knows how, but I don’t.
Or, how to offset a parenthetical with parentheses, commas, or em-dashes?
So, while I will agree that a voice is much more expressive than text, I think it only matters if the communication is received aurally. If you are intending the message to be read, I think the best interface with which to create that message is the “native” one—keys.
It may come to pass someday that a sufficiently advanced AI will be able to discuss our intention from the tone and cadence of our voices (or more likely there will be a convergence wherein we learn to speak in a way that is easier for the AI to infer our intentions), but I see that a good distance into the future—and I will note here I’m usually very bullish about the technological future.
All of your points make sense to me, but I’ll try to reiterate the point from my previous message.
Typing/reading may be gone alltogether in the future, paving the way for voice (and potentially some new types; say emojies or some of their descendants) of communication interfaces.
Sentiment analysis of written text or transcriptions is available but I haven’t read much about sentiment extraction from spoken language. I imagine it’s not far behind.
Even as a human, it’s easy to misinterpret another person’s spoken intentions. eg: Someone is passionate, but it comes across as angry, which wasn’t their intention. Or one person recognises a joke and the other takes it seriously.
But your response makes me wonder: - At what point do machines stop trying to replicate humans and instead devise strategies that are superior to humans. ie: What means of communication would computers use to communicate between themselves without catering for humans.
Of course, humans will keep trying to replicate humans, but if we reach AGI, the machines will look beyond that.