I use whisper a lot to transcribe voice messages. I already had weird hallucinations come up in the past when I’d accidentally send completely silent voice messages (I’d have to look up what that was but if I remember correctly there were some links to other voice transcription AIs not from OpenAI though)
Today I sent a long voice file recorded by myself and at the end of the transcription of the four-minute voice message this sentence was appended:
Very weird. Does anyone have an idea how this could come up? The content in my voice message was not related in the least to any government agencies or anything of that sort.
For the following text, summarize the idea,
extract meaningful questions and give possible answers.
```
I use whisper a lot to transcribe voice messages. I already had weird hallucinations come up in the past when I’d accidentally send completely silent voice messages (I’d have to look up what that was but if I remember correctly there were some links to other voice transcription AIs not from OpenAI though)
Today I sent a long voice file recorded by myself and at the end of the transcription of the four-minute voice message this sentence was appended:
“For more information visit www.FEMA.gov”
Very weird. Does anyone have an idea how this could come up? The content in my voice message was not related in the least to any government agencies or anything of that sort.
```
Reply
Idea Summary:
The user is experiencing unexpected results while using OpenAI’s Whisper for transcribing voice messages. Specifically, the transcription of a long voice message included an unrelated sentence at the end, directing to the FEMA government website. The user is puzzled as the content of the voice message had no connection to government agencies.
Extracted Questions:
Why is Whisper appending unrelated sentences to the transcription of voice messages?
How can this issue be resolved or avoided?
Possible Answers:
Based on the information extracts provided, it’s not entirely clear why this specific issue is occurring. However, there are a few possibilities:
This could be an instance of what is referred to as “hallucination” in the AI field, where the model generates output that isn’t grounded in the input data. This has been observed in some discussions about Whisper on GitHub (source (https://github.com/openai/whisper/discussions/293)).
The issue might be related to the handling of long or silent segments in the audio. Whisper has been reported to sometimes struggle with long gaps of silence or background noise, and may repeat or generate unexpected output in these cases (source (https://github.com/openai/whisper/discussions/29)).
As for resolving or avoiding the issue, there are a few potential strategies:
If the audio file contains long segments of silence or background noise, it might help to trim these parts before sending the file for transcription.
If the issue persists, it could be beneficial to report it to OpenAI, providing them with the details and the problematic audio file for further investigation.
FWIW I had the same domain come up - in this case I think it was using the faster-whisper wrapper with medium.en model. I was using it live with small samples (4 - 10 s) at a time - and repeatedly got www.fema.com.au back as the transcript. AFAIK there was spoken word in the audio - rather than silence / hum.
Someone else on Reddit yesterday got a similar oddball at the end of their whisper when talking to ChatGPT, something like “this transcription service provided by xxxx company”.
I suspect if you train an AI on labeled audio data, and there is enough reinforcement that after transcripts seem finished, metadata is also included and rewarded, you get that metadata showing who’s efforts you stole.