Introduction
When using the OpenAI Whisper model for transcribing audio, users often encounter the problem of random text generation, known as hallucinations. This issue primarily arises when the input audio contains significant silence or noise. Here, we share an effective method to mitigate this issue based on careful observation and strategic use of prompts.
Problem
The Whisper model tends to transcribe random text when it processes silent or noisy audio segments. This hallucination problem can be particularly troublesome when dealing with entire recordings that include periods of silence or low-level noise.
Observation
During testing, it was observed that adding specific strings to the end of the prompt influenced the model’s hallucinations. To confirm this, the string $$$
was appended to the prompt. The resulting transcriptions frequently included $$$
when hallucinations occurred. Additionally, when the audio was entirely silent, the model returned only $$$
, and for partial silence with noise, it returned $$
followed by random words.
Solution
The solution involves several key steps:
-
Custom Prompt:
Append a specific pattern ($$$
) to the end of the prompt. This pattern helps to identify when the model generates random text during silent segments.function transcribeAudio(blob) { const formData = new FormData(); formData.append('file', blob); formData.append('model', 'whisper-1'); formData.append('temperature', '0'); formData.append('prompt', `Please transcribe the following audio accurately. technology, innovation, future, AI, $$$`); return fetch('https://api.openai.com/v1/audio/transcriptions', { method: 'POST', headers: { 'Authorization': `Bearer YOUR_API_KEY`, }, body: formData }).then(response => response.json()); }
-
Pattern Detection:
Use a regular expression to detect and filter out responses containing$$$
or$$
followed by random words. This ensures that only valid transcriptions are processed further.function bypassDollarStrings(s) { const pattern = /^(?!.*\$\$).*/; return pattern.test(s); }
-
Retry Mechanism:
Set up a retry mechanism that restarts the transcription process if the response contains the$$$
or$$
pattern, indicating hallucination.
Conclusion
This approach leverages specific prompt patterns and a retry mechanism to effectively mitigate hallucinations in OpenAI Whisper transcriptions. By using these techniques, you can significantly improve the reliability of transcriptions, especially in recordings with periods of silence or noise.