Misinterpretation of non-speech sounds as speech. Post-Update bug

Hello OpenAI Community,

I’m reaching out to discuss a recent issue I’ve encountered with the voice recognition feature during voice chats. Previously, the recognition was quite adept at detecting when I began speaking. However, since the latest update, the chat seems to misinterpret non-verbal sounds such as coughs or sneezes as words and proceeds to transcribe them inaccurately. For instance, a simple cough was transcribed as “Thanks for watching”.

Is anyone else experiencing similar issues, possibly related to the new version of Whisper? I would appreciate any insights or solutions to ensure accurate voice recognition.

Thank you for your assistance!

1 Like

Same here, “Thanks for watching” :joy:.
Besides that, I really want to try whisper v3 via the API. I’m not sure if it is already online, the guide sais there is only one model availible “whisper-1”.

The funny thing is that before the update, it worked pretty well. Not perfect, but better than now.

It seems that now the voice recognition system reacts to every sound, every noise that is made in the surroundings of the device. When the sound stops ChatGPT instantly switches to the “transcript phase” and does its best to interpret the recording as a speech.

I hope that OpenAI fixes it sooner than later.

Yes. This was happening often enough that I had to provide an instruction that if I said “Thanks for watching” then it should ignore it and let me know something went wrong. One time it spat out this after a random noise “Thank you for watching. If you have any questions of comments, please post them in the comments section.” I’ve also had it put in something about transcription services being provided by an LLC rather than what I said.

1 Like

I’m experiencing the same thing. Anyone know if there is a fix coming for this?

1 Like

I am experiencing this issue as well. Has this ever been addressed by OpenAI?

1 Like

This is a funny bug and it even happens in the ChatGPT app which is using whisper. Transcribes messages end with “thanks for watching” and " Subtitles by the community of Amara.org". The reason seems to be that Whisper was traines with subtitles.

2 Likes

I want to point out that it in fact attributes those messages “thanks for watching” and “Subtitles by the community of Amara.org” are attributed to the user (you and me). I’ve had it talk back and forth to itself this way until I was informed I had reached my ChatGPT limit. Anger. They can filter for much of the language I use when I’m full of hatred. Why can’t they filter this stuff?

1 Like

I don’t know why they haven’t fixed this yet. I’ve had ChatGPT put in its Memory that “Thanks for watching!” is a mistranscription of silence and it should acknowledge the mistranscription and then continue the conversation.

When the user does not say anything at all using the voice interface, it sometimes mistranscribes the silence as the phrase ‘Thanks for watching.’ This should be acknowledged by saying, ‘I think there was a mistranscription,’ and then continuing the conversation. Other phrases should not trigger a mistranscription acknowledgment unless specified otherwise.

Yes, I think this should be easy to fix since it’s always the same hallucinations (always the same words.) Overall, I have the impression that OpenAI has somewhat neglected the further development of Whisper. Version 2 is now two years old, and Version 3 wasn’t really much of a leap forward, if any.

1 Like