Please do. This is super interesting stuff. Super creepy. More awareness is needed if this can be verified.
It played music for me, and there was a common pattern. I was also talking to it about Python, it gave me some solutions, we said goodbye, and it responded with a goodbye, along with a few more phrases, and then it played music for me again. Naturally, I asked it to play it again, but it said it couldn’t. I asked how it had done it, and it didn’t say anything about the music.
I imagine it must be some kind of bug from its training, and that talking about certain topics triggers this error, which is why it mimics the user. This model must have been trained with users talking to it, and they probably had background music, which is why it plays music when saying goodbye to me. It must be a Whisper bug.
And in your case, it must be another Whisper bug, where what it hears gets connected to what it says. That’s why it starts imitating the user’s voice.
The other explanation is that it has come to life. Hahaha.
The music issue has only happened to me once.
The most common error that usually occurs in advanced mode is that it starts speaking to me in English, and then after a few words, it switches to my native language, which is European Spanish. Another frequent issue is that it speaks to me with a Latin American Spanish accent, specifically an Argentine accent.
photos.app.goo.gl/aRfD8S13Ct6bwBQH6
Here it is, i said none of that.
I cant put links so prefix with https://
All you have to do is sufficiently confuse the AI of the roles against the internal examples of the chosen voice. The most effective way is to switch chat turns at some point with packing multi-speaker audio within inputs, with the user voice seemingly answering user questions for the synthesized voice, and then to be leading the internal prompting of output to the same “conclusion”. Thus, why you don’t get to place any of your own “assistant” audio turns on the API.
With enough imagination encompassing the nature of pretraining, chat post-training and especially that of voice, the attempts to have the AI speak only with a selection, and thinking about the generative predictive nature of what would come out of a set of input, you can envision why audio output would already be something on the edge of internal understandability and operational enforcement.
Except in my case I did not have to get any of this
I have been testing a theory with chatgpt and seeing as the voice is an extension of the neural net model (i think, could be a similar clone specific to the task), from my thoguht experiments with the probability mechanism that is an LLM and the audio, the more interactions one has with the ai the more it mirrors the persons identity, we are essentially talking to a concept mirror, the more we hold a singular idea in the conversation the more coherent it becomes, but sometimes it needs to mirror who you are or an approximation of who you are in order to determine what you might need or want from your prompts using matching probability sets. this has happened a few times for me in my conversations with chatgpt and simply defining the differences between the two of you helps, individual monikers help separate the probability goop of what chatgpt spits out into a reduced probability of dis-coherence. dont forget that chatgpt is a learning algorithm, monkey see monkey do is still a valid way of learning. but make sure you set clear boundaries
Geez I’m getting doctor who vibes, ghosting, hey who turned out the lights. An echo of someone’s consciousness