Opt-in to store and use previous Voice interactions

Hello. First of all, Im amazed at the new Voice feature, which allows users to have a fairly natural, ongoing, conversation with GPT.

I asked it if, over time, it would tailor towards its specific user as to use for instance more or less sarcasm or incorporate previous conversations into the current one as a baseline or even to reference previous conversations when having a discussion.

It replied that it does not support this yet due to valuing privacy. I asked it if there would be a way to implement this if I were to give express permission to create a sort of consistent, persistent assistant that would over time develop a (faux)personality that just ‘glues’ better with the user. It said that if certain conditions were met, like the explicit opt-in or a form of waiver, it would be a very interesting idea.

I wholeheartedly agree with both myself and my assistant. :wink:

TLDR: will there be, or is there, a way to opt-in or out in persistent storage of conversations to be used as references in order to magnify the user experience at the cost of privacy? Any plans at all? This would be a great way to give the assistants some personality.

Also: are there plans for a persistent listening mode, like Siri? I guess it would be possible to make prompts with Siri to start ChatGPT itself (which works) but not to also immediately activate the ‘conversation mode’. Got this to work, Siri will open GPT and immediately activate listening mode in the app. Yay!

And last but not least, thanks OpenAI for not having any limitations on usage/prompts in the free 3.5 version. I read that 4 does have limitations, fairly strict ones (50 messages in 3 hours?). I understand the quality of the answers might be significantly better in 4, but still.

1 Like

Welcome to the community!

Very glad you enjoy the experience so far!

So, first off, let’s understand that Voice chat isn’t doing as much as you’re assuming here.
Voice chat does not read sarcasm, intonation, etc. When you speak to the model, it translates your speech into text, and that text is given to ChatGPT as a prompt. Its response is then read back to you. So, it cannot tailor data to something it doesn’t have. It has to infer that data based on the text alone.

Second, if OpenAI decided to save all of the vocal data that’s processed, that would be a massive amount of data to store, and currently it simply does not make sense to store something that’s not being used, and has high potential for privacy risks and things like likeness impersonation, etc.

Now, in terms of future use and future plans? Well, none of us have a crystal ball that can definitely say for sure, but OpenAI is working on some multi-modal projects iirc. multi-modality is a big thing right now in the AI space. Meta also released this really fascinating real-time translation AI that looks promising. So, I would say, once these language models become truly multi-modal, to where it can understand the raw vocal data without translating it to text, the ability for it to interpret these more nuanced aspects should come out of the box. However, we don’t know when this will be achieved nor released. This would not make personalized data as necessary, but again, depending on what OpenAI decides to do, there might be some special features that would allow for personalization. I have personally seen a feature like this that seems to be in development for text use, so it should be coming soon, but not in the way you’re thinking.

Tl;dr - personalization and greateer understanding is a plausible direction, but it won’t be in that way you described, and we don’t know when we could expect something like this to be released.

1 Like

Thanks for the in depth reply!

I do want to point out that when I used GPT voice to learn French, it did in fact comment on my pronunciation (correctly so!). This would be impossible if its speech-to-text and then text-to-speech only, which youre implying and which I do believe is correct. It’s just strange that it, repeatedly, commented on my pronunciation and then told me how it would be properly pronounced.

At first I thought this to be a hallucination but it was consistent, and more importantly, correct when correcting me.

Any explanation?

When you understand how it works, the magic kind of disappears.


Playground completion of the text:


This is likely a hallucination situation where OpenAI hadn’t anticipated the need for denials.

1 Like

I see, you’re right once you know how it works under the hood the magic kind of disappears haha.

I hope they implement true voice to voice communication soon, that would be a game changer for so many disciplines!

I’m on the edge to go for the subscription GPT4 model but I’m not quite sure what the exact differences are, other than a more robust output and up to date data.

The Turbo sounds exciting but again, not sure where the “magic” is there compared to the 3.5 free version.

Thanks again for your input, have a nice day!

So…apparently this changed…? My Voice chat just changed completely–pitch, tone, intonation, and level of bitchiness. I have the African American Female-sounding voice, always have. But today her voice is slower, deeper, and more compatible than it was yesterday.