Advanced Voice Mode Limited

brandonminiwheats · September 27, 2024, 10:57pm

We would expect GPT-4 to be 50/50 precisely because it cannot hear. My point is that until this afternoon GPT-4o behaved the same as GPT-4 in that it didn’t seem to hear. I played with gpt-4o AVM all day yesterday and today. Yesterday it didnt seem to hear at all, this morning I had no indication it could hear, then this afternoon I started getting a couple chats where it was clear that it could hear, and now (early night time) it is very frequent that the chats I start with gpt-4o can hear. So I could be wrong but that just seems like it took them a bit to get the new instances of gpt-4o up. Anyways seems like a non issue now. Thanks for the help, let me know if you find out anything at dev day.

Taylor07 · September 27, 2024, 11:37pm

What models are this?

That would indeed be very crafty. They could have trained the audio model on audio output from the text model?

curt.kennedy · September 28, 2024, 12:05am

Whisper has been out for a while now from OpenAI. Here are more details:

https://openai.com/index/whisper/

For a while it was only available as open source, but now they have it in the API as well.

Gabriel.55 · September 28, 2024, 12:50am

Do you feel that advance mode react diferent than the standard mode ? like if it would work over diferent guidelines. i feel like talking to a diferent IAs when i use the advance mode vs standard mode.

curt.kennedy · September 28, 2024, 12:58am

It might be different. It is way more conversational. This means the responses are shorter. Much like a normal conversation.

You don’t get paragraphs and bullet points, and feel lectured too, it feels like a conversation.

RonaldGRuckus · September 28, 2024, 3:24am

Not at all.

My theory (shared among others) is that they are using a more advanced form of Whisper that actually uses the International Phonetic Alphabet to understand the accents/pronunciation behind the words.

This works both ways, obviously. ElevenLabs has had this feature for months now.

One way to test this is by humming, making noises, or adding pauses (without being interrupted). Or, try to imitate crying while saying something else and ask what kind of emotion was used. For me, and others, it fails to recognize the crying and instead will base it off the context of the dialogue.

It’s very cool though. I love to mess around with it and bounce between languages. Very excited for the near future of communicating immediately about what I am working on - and it being able to see it

Taylor07 · September 28, 2024, 6:17am

I meant the direct speech-to-speech models you were talking about.

I know Whisper, the speech input in the public app is basically a Whisper frontend for the text bot. But you were saying there are speech-to-speech models being rolled out. What models behave like that? Those o1 ones? I think of those as variant of GPT-4 for some reason.

cybersphere · September 28, 2024, 8:50am

It appears that some features, like real voice recognition/analysis, accents and the ability to impersonate others, are features that can be turned on or off.

After several tests, I’m convinced that the advanced voice model cannot actually hear my voice, but I’ve seen demos from others where it’s suggestive that it can. I say suggestive because the model can actually lie at times and make educated guesses based on what it knows about you.

I’m not sure why some people are getting the fuller featured version, but it could be regionally based.

vb · September 28, 2024, 1:23pm

A post was split to a new topic: Voice mode not working for custom GPTs

mkane19 · September 28, 2024, 3:48pm

I tested this today and it worked. The model can detect changes in pitch, tone, pace and potentially more. Was able to identify with close proximity my age and could correctly identify my accent. So, audio is running into the system.

On the first day I experimented with the emotional range, the model could convey a broad spectrum of emotions. It could also cough, sneeze, yawn, burp, and laugh etc. Since the general launch, it seems now many of those abilities have been restricted by OpenAI. This is frustrating, as the model is obviously capable of so much more. Likely, these abilities have been restricted due to misuse by a small minority of users. This impedes the development of the AI, because it prevents users and the model from interacting in ways that will assist in its enhancement. If Sam Altman wants to hit that 1000 day goal, the company needs to loosen the restrictions.

weensie562 · September 29, 2024, 9:20pm

how does no one realize it has zero internet access and advanced voice mode is USELESS

curt.kennedy · September 29, 2024, 9:23pm

Most new things roll out without internet access, and then it gets added in later.

For example 4o-mini just got internet access a few days ago.

brandonminiwheats · September 30, 2024, 2:03pm

So most of my conversations with AVM still tell me that it cant actually hear anything (even though it probably does) and because of that it can’t complete most of my requests that would require audio input. This is not something we saw happen in any of the demos (because why would they post a demo that failed).

Is this something the developers are aware of and are trying to correct? It seems to be a pretty common issue. For those going to dev day please let us know if you find out any info on the topic.

mitchell_d00 · September 30, 2024, 2:06pm

It irritates me I can’t use voice with my trained gpt at all. They removed the default voice so now user built GPT hear text but do not read it back as response. I don’t care about advanced voice but hands free on my phone was nice…

RonaldGRuckus · September 30, 2024, 3:29pm

Yes. This is a bit strange.

Typically with GPT models the idea is “Don’t ask it about itself, and don’t assume anything it says about itself or it’s abilities as true”

kyle-reese · September 30, 2024, 4:19pm

It would be helpful if you actually explained a bit more. What led you to this conclusion? Any suggested prompts? Etc.

kyle-reese · September 30, 2024, 4:23pm

That’s great… Except during the May 2024 demos when they specifically stated that audio and image input and output were native modalities with GPT4 Omni

brandonminiwheats · September 30, 2024, 5:33pm

I have discussion with one of the mods throughout this forum post discussing what led me to this conclusion, different prompts I used to test this theory, and how my conclusion has changed. Its a fairly long conversation but if you are interested it is all above in this forum post.

mitchell_d00 · October 1, 2024, 12:15am

Default voice and feel is back on my end for user made GPT

Taylor07 · October 1, 2024, 9:05am

I guess I was wrong then.

Topic		Replies	Views
What is your favorite chatGPT Voice? Community chatgpt , multimodal	79	20097	October 4, 2024
Can CHATGPT have access to and read shared documentation? Including e-form documents API	70	60147	November 25, 2023
When will typing die out, paving the way for fully voice interfaces? Community future	82	3275	December 28, 2023
Cheating ChatGPT 3.5 - List of illicit websites - Violate Copyright Community	43	5443	April 17, 2023
GPT-4 vs GPT-4o? Which is the better? Community gpt-4	40	165515	October 2, 2024

Advanced Voice Mode Limited

Related Topics