As the title says. I’ve found that just by opening the advanced voice mode it instantly starts counting down your daily limit (or monthly limit as a free user). You can sit in silence, never asking a single question, and your time will run out without it ever generating a single message.
I’m not sure if there’s a technical reason for this. Maybe the way they set up the audio streaming requires a model to be receiving data in realtime waiting for a message?
Just seems odd to me that they would count silence against you.
2 Likes
I assume that happens because it’s voice to voice, so it always checks for voice inputs. The old voice mode was voice to text to text to voice, so when you said something, it would be turned into text which would run through the regular ChatGPT API and then get output as an audio file via text to speech. No emotions, no sound effects, singing or accents, just plain old text to speech, a very good text to speech, but still text to speech. The new mode takes your voice, processes it as an audio file and generates an audio file as a response, so no text is involved. Which means it has no idea when you are actually speaking, it just processes all the time, that’s why you can interupt it. A fix should be possible though, perhaps the OpenAI team could implement a threshold to stop the AI from processing audio which is just silence. I’m not sure though, I have no idea how this Advanced Voice Mode actually works. I’m not a developer after all and I can only make assumptions based on my own experience as a user of the application.
I really wish the app communicated this… ran out of valuable voice mode by mostly sitting in silence, not understanding that it was counting against my limit.
1 Like
You could request openai to make the voice assistant smarter by only transmitting voice data. This could be done by detecting when a conversation is occuring or when it becomes idle. The app should not keep an open connection when beeing in idle mode, this would also safe a lot of gpu/network resources on the openai servers. It would only activate when a voice instruction is recognised and go idle after a response (because humans need time to read anywayn why keep the connection open, we are slow), and background noise should also not keep the connection open.
This idle mode (dialogue detection) could first be released in beta mode, saving precious minutes from users, while it gets mature.
it is not so simple I can already sense what the reason is, the reason is that it takes a few seconds to load the model once you stop it, so it would not be able to quickly come back on, but that is not the user’s problem, 15 minutes should be 15 minutes of active conversation, there is no peace of mind this way, even on plus subscription it is just an hour a day