Currently, there is no telephone access to ChatGPT. This limits users without smartphones. The introduction of national, fee-based phone numbers for voice chats with ChatGPT is proposed. This solution would enable access for a wider user base and create barrier-free accessibility. At the same time, the fee-based calls could contribute to covering the operational costs of ChatGPT.
An important user group would be people of all ages without access to modern communication technologies. They could call ChatGPT, for example, to combat loneliness or to clarify everyday questions. This would significantly improve their quality of life. Implementing such a phone number would provide added value for many user groups and represent a new source of income for the operator of ChatGPT.
I’m not sure they would implement this anytime soon.
But this reminds me, I do have a project in the pipeline to hook a speaker up to the API and have the API triggered at random intervals and the speaker emitting random things throughout the day that the AI is producing in the background
You could just put such a speaker in a public place and have AI PSA’s. No other devices required from the perspective of the user
But it’s more of a side/learning project for various standalone dev boards, more than anything.
I do think a virtual phone call with AI is cool, even better a virtual video conference. Someday … someday.
Thank you for sharing your thoughts!
It seems we might be discussing somewhat different things, don’t you think?
My feature request focuses on one-on-one conversations between a user and the AI, distinct from a multi-party telephone conference.
A title like “Telephone Conversations with ChatGPT” or “ChatGPT Interaction via Plain Old Telephony (POT)” would more accurately reflect the individual voice interaction aspect of my proposal. This idea is about establishing national, fee-based phone numbers to enable these personal interactions with ChatGPT through traditional phone calls.
Assuming there are no legal obstacles, anyone could set up such a service using the ChatGPT API. From this perspective, it could be seen as a wonderfully scalable, almost risk-free business idea, or even a sort of money-making machine.
I think the first challenge would be to get the latencies in inference down so that the AI could quickly respond back via voice.
This is probably the big “hang up” on this idea (pun intended)
As for a national pay per minute POTS line … I thought those days were over, honestly, with everyone having a cell phone these days. Not sure how this would work, or who would use it. Target demographic is ???
PS Talking real live inferences over voice here. There are already bots that essentially use pre-recorded responses and embeddings to respond to you. This is already out … like that ‘guy’ that calls from time to time asking money for the police brotherhood (or whatever) … he’s a total bot that uses canned responses based on his lag time in responding. So this is TTS (likely partial) + RAG + play nearest waveform via quick embedding/keyword style search, eliminating inference and TTS lags. And still laggy.
So lag is key here. Need super low lag. Like <500 ms to really pull this off. This is tough with a high quality LLM, let alone the overhead for RAG search + STT and TTS lags.
SOTA now on this is likely around 2 seconds, which is an eternity on the phone.
So if you wanna feel like a clown
To see your bank account balance go down
and chat those robot thoughts.
There’s several that have occasionally posted such hobbyist projects on social media. It can be hard to tell from just the phone number if it is an enticement to pay-per-call or international fee scams that hit your bill.
Admittedly, latency is a challenge in voice-based AI systems using such phone numbers. A quickly responsive “cerebellar AI” acting as an intermediary layer could be a solution. This AI employs short, natural fillers (“Oh, that’s interesting.”, “Let me think about that.”, “Stay on the line, I’m checking something.”), similar to the thoughtful pauses often found in normal telephonic consultations, while the main AI formulates a response. This reduces perceived wait time and makes conversations more natural and fluid.
Through training, the cerebellar AI can contextually apply these sentences, enhancing communication realism and comfort. Integrating this AI into systems operating over traditional phone numbers could significantly address latency challenges and improve service accessibility and user-friendliness.
For users, access to services via traditional telephone numbers is important, regardless of the transmission technology, such as landline, cellular, or VoIP.
I think the “natural pause” AI is an ingenious way to help bridge the latency gap!
As for POTS, again it’s dying. After a fresh google search today, look into FCC Forbearance Order 19-72A1.
Basically the FCC wants to kill POTS. The POTS providers are given the full OK to jack up prices by as much as 150% and even stop maintaining POTS infrastructure. So without government backing holding it up, POTS is going to be totally dead here soon.
POTS is a “left for dead” technology. The copper wires will be allowed to rot away, and the extremely high pricing will push away users, to likely VOIP, or cellphones.
I regret having used the term POTS, as it was completely misunderstood.
I simply meant standard telephony (regardless of the medium used), which follows the numbering plan according to ITU-T E.164 (see E.164 : The international public telecommunication numbering plan and https://www.itu.int/dms_pub/itu-t/opb/sp/T-SP-E.164D-2016-PDF-E.pdf).
In the GSM standard, this service is referred to as TS 11 (see excerpt from ETSI Standard 3GPP TS 22.004), as opposed to BS 20.
OK, I guess I am confused. From your original post, you are saying you would like it for users without smartphones to have access the ChatGPT. Which is admirable.
But 3GPP is a standard regarding mobile phones (ref). So yes, I am confused here.
So yes, there is a percentage of folks that use “dumb phones” or phones that are not “smart phones”, mainly to limit screen time and other distractions. Is that your target demographic?
So basically people “without smart phones”? Which are also those that want to limit tech and interference in their lives … which I get, but these people are extremely hard to reach with AI, and they certainly won’t pay.
If so, what percent of these folks is ready to take the AI plunge without a smart phone? I’m thinking it’s a tiny fraction of the population. And would they even pay for this service? You can get ChatGPT on your smartphone or device for free. Why would they pay?
So all I see is an unmotivated, non-paying, small audience. Thoughts?
But going back to voice only AI. Once we get that, it’s a game changer! However, latency is the big obstacle right now, but with special low latency models, and your “neutral pause” AI, I think we can get there.
I am already paying for ChatGPT-4. And I have been using it very intensively for almost a whole year, both on PC and on my smartphone. However, sometimes I want ChatGPT to answer a question when I am not in a position to type a prompt, for example, when I am driving.
Answer of ChatGPT:
*“You could utilize the voice-to-text features available on most smartphones and PCs. This function converts spoken words into text, which ChatGPT can then process. Activate this feature on your device and simply speak your question or command. The transcribed text will be inputted into ChatGPT. This method is particularly useful when you’re unable to type, such as while driving. However, it’s important to prioritize safety and adhere to local laws regarding the use of mobile devices while driving.”
Seriously, am I supposed to read the response from my smartphone while driving? Here, where I live, that’s definitely illegal.
So one thing I developed about a year ago is a system where I can talk to my iPhone, it sends a text to my server, responds back understanding all relevant history and RAG retrieved information, some OpenAI API model inference, and sends me a text back, that I can read the SMS or have text-to-speech from the phone vocalize the response.
This works well for edge cases where you are in the boonies, away from wifi, but have 5G, LTE or 3G signals available.
All this uses is the OpenAI API, some AWS serverless functions, a database, and some SMS API (I use Twilio).
I even created an interface where I can add friends and customers from my phone without additional programming. Also different AI personalities commanded to each person from my phone.
So it takes effort. But a doable fun project.
So yes with this setup, I can “talk to AI”. Also you can stuff the RAG with whatever crazy information you want the LLM to parrot.
This kind of project will get you into the AI user space quickly, so highly recommended!
The concept of using phone calls to interact with AI systems like ChatGPT underscores the ongoing significance of telephonic communication across various service sectors. Particularly in areas such as hospitality, flight bookings, and customer service, phone calls remain essential, especially for individuals who prefer personal interaction or lack access to other online technologies. Telephone calls provide a valuable complement to digital systems by facilitating personalized interaction and swift solutions for complex issues.
Integrating AI into telephone-based booking and information systems will significantly enhance these systems’ ability to understand and appropriately respond to complex or infrequent queries. This not only improves the user experience but also makes telephone-based services more versatile and user-friendly. For people without access to modern communication technology, such as the elderly or those in remote areas, this could make a substantial difference.
The ability to browse via telephone is another important application. Phone browsing will offer an accessible alternative for those without smartphones or in remote locations. AI technologies can help make this experience seamless and effective. By offering phone calls supported by AI, access to information and services is greatly expanded, potentially improving the quality of life for many. In well-served areas, smartphone users are likely to prefer text- or voice-based browsing through WhatsApp or other social media platforms, often using the…microphone icon.
Enhancing the functionality, in-call browsing and in-call assistance could be introduced, allowing users to activate an AI during a phone conversation with a specific command, akin to interactions with Alexa. Examples of this could include:
.“Hello ChatGPT! Please activate/deactivate real-time translation of my phone conversation into Chinese.”
.“Hello ChatGPT! If I want to go to LA now, what’s the fastest mode of transportation available, when will I arrive, and what’s the cost?”
.“Hello ChatGPT! When are the next public holidays in [specified location]?”
.“Hello ChatGPT! My daughter lives in [location]. Where is the best midway point for us to meet, balancing our travel times? Also, can you recommend a hotel where we can stay, and what would be the cost for each of us?”
.“Hello ChatGPT! Please call Mike and ask him if he would like to join our call! If yes, please add him to it!”
.“Hello ChatGPT! Please look for a half-hour meeting slot with Nancy in my calendar, coordinate the time with her or her calendar, schedule it in my calendar, and inform us during this phone call when it is set!”
.“Hello ChatGPT! Please follow our conversation about the sudden death of Erwin. Later in the phone call, when prompted, draft an appropriate condolence letter and read the draft to us!”
The feature of in-call browsing and in-call assistance would significantly enhance phone-based AI services, offering real-time assistance and information to address immediate user needs in a dynamic and interactive manner.