A fusion of the voice models would be ideal

Having two different voice models drives me crazy. One can browse the internet, has an American accent, and speaks like a Latin American (in Spain). Sometimes it speaks in English, and I don’t understand a thing. It takes a long time to transcribe, and when it responds, it gets overly wordy (too much). I’ve tried asking it to be brief, but even when it shortens its responses, they are still too long. Additionally, there’s a frequent error where it somehow interprets that I’m saying something like “follow me on my channel,” or “watch my video and give me a like” (not literally, but something along those lines).

Then there’s the advanced version. It can’t browse the internet, its voice is different—less masculine—without an American accent, although sometimes a bit of an Argentine accent slips in. It can be interrupted while speaking, which I love! But if you enter the chat and try to copy something, the interaction is lost, and you have to start over.

Its most common error is that the internet-browsing voice with the American accent suddenly pops in saying something like, “My programming doesn’t allow me to talk about that.” Once this starts happening, it happens frequently. Also, sometimes it doesn’t finish saying everything, so I have to go to the written chat to read it.

I wish they would merge the two with the best features of each one because if they combine all the flaws, it will be a disaster.

Another thing I’ve noticed is that the more they’re personalized, the more different they become. It’s subtle, but I’ve asked both of them the same questions, and each gave me a different answer.

Yeah, I guess you’re not the only one with this wish.

Because people would like to use advanced voice mode for example also in the Custom GPTs. Also when pics are being created, something has been uploaded, etc. Because if you do so, then the advanced voice mode is gone.

A little research gave away:

Those models are inherently different (from my understanding).

So, so far you can use only either not both or something which would be a mix.

P.S.: I don’t know if it’s possible with those models, le’ts hope they either bring out a new one that can do this or a feature or the current one evolves to be able to do so.

Definitely, if they only keep one, it should be the old one, with the added advantage of being able to interrupt it.
I gave a lot of feedback about the old one’s English accent and how it often mixed English with Spanish. Despite all that, I only use the old one currently.
The new one has tons of restrictions and feels like talking to a politician – they talk a lot but say nothing (this expression is very common in my country).
So, I always type “Hello,” and then I can use the audio, knowing the old model will show up. If I don’t type something beforehand, the new model automatically appears.
I don’t know how old you all are here, but if anyone watched the Knight Rider series, the old one is KITT to me, and the new one is KARR.