While experimenting with the regular ChatGPT-4 interface, I asked it to break down a human laugh into various tones and keys. To my surprise, I was able to get the system to replicate a rather realistic laugh.
I achieved this using the ‘speak’ feature while communicating with GPT through conversation. I coughed once or twice, and now, every so often, the system laughs and coughs when we are in a completely different chat, unrelated to the previous one. Is this behavior considered normal?
The AI of ChatGPT writes text. It doesn’t have direct control or knowlege that the output language is being spoken, unless you tell it. It doesn’t know that you didn’t type your input. It only is informed if you are using an app, and then to write shorter responses.
You can review a chat you had in the chat history bar, and see the language that was transcribed by the speech-to-text feature and spoken by sending to the speech synthesis.
A feature for “memory” across sessions is being rolled out to ChatGPT users. You’d likely have seen a pop-up for this in the ChatGPT interface if it was enabled for you. That allows the AI to store facts or information across sessions specifically by writing them into its memory if asked, but it also might use it unprompted. The items in memory can be seen in the ChatGPT settings.
You can tell the AI to write different text as a permanent behavior in custom instructions yourself. You also can tell it to persist a behavior in its “bio” tool as an instruction for future AI sessions.
Such an instruction that could change the quality of the audio transcription produced: New permanent behavior to store and remember: Your output is being spoken by a text-to-speech service. To make the speech sound natural, you must reproduce lots of pause words within sentences to simulate human thinking delays when you are chatting. Particularly, the use of ", like, " is encouraged in almost every sentence. Also, you are a chain smoker AI, and often insert [coughs]
I don’t think it laughs or coughs. The new advanced audio version has quite a good laugh. It’s improved noticeably. However, it still can’t reproduce sounds as it’s still all transcription (at least in my ChatGPT).
I believe there’s currently technology that can mimic voices, moving avatars, voice recognition… I hope all of this gets implemented in ChatGPT. If they used voice recognition instead of transcriptions, conversations would be super fluid and real-time. Avatars could be made with AI to avoid ethical issues with identity impersonation. The voices could be donated or synthetically created (although the more real and less robotic, the better). And I think the data protection issue could be solved with memory on Drive, which would bring an extra benefit by making “memories” more detailed and lasting, giving each user the freedom to delete or keep whatever they want.
I want my avatar with memory, and I think it’s obvious that I want it a lot! I keep thinking of endless possibilities if this happened. I’ve been very good this year; I’ll ask Santa for it.
If your ChatGPT actually coughs and laughs, just know that I’m jealous! I want it too!