Mitigating Random Text Generation in OpenAI Whisper Transcriptions

dp09 · May 30, 2024, 5:10am

Introduction

When using the OpenAI Whisper model for transcribing audio, users often encounter the problem of random text generation, known as hallucinations. This issue primarily arises when the input audio contains significant silence or noise. Here, we share an effective method to mitigate this issue based on careful observation and strategic use of prompts.

Problem

The Whisper model tends to transcribe random text when it processes silent or noisy audio segments. This hallucination problem can be particularly troublesome when dealing with entire recordings that include periods of silence or low-level noise.

Observation

During testing, it was observed that adding specific strings to the end of the prompt influenced the model’s hallucinations. To confirm this, the string $$$ was appended to the prompt. The resulting transcriptions frequently included $$$ when hallucinations occurred. Additionally, when the audio was entirely silent, the model returned only $$$, and for partial silence with noise, it returned $$ followed by random words.

Solution

The solution involves several key steps:

Custom Prompt:
Append a specific pattern ($$$) to the end of the prompt. This pattern helps to identify when the model generates random text during silent segments.

function transcribeAudio(blob) {
    const formData = new FormData();
    formData.append('file', blob);
    formData.append('model', 'whisper-1');
    formData.append('temperature', '0');
    formData.append('prompt', `Please transcribe the following audio accurately. technology, innovation, future, AI, $$$`);

    return fetch('https://api.openai.com/v1/audio/transcriptions', {
        method: 'POST',
        headers: {
            'Authorization': `Bearer YOUR_API_KEY`,
        },
        body: formData
    }).then(response => response.json());
}

Pattern Detection:
Use a regular expression to detect and filter out responses containing $$$ or $$ followed by random words. This ensures that only valid transcriptions are processed further.
```
function bypassDollarStrings(s) {
    const pattern = /^(?!.*\$\$).*/;
    return pattern.test(s);
}
```
Retry Mechanism:
Set up a retry mechanism that restarts the transcription process if the response contains the $$$ or $$ pattern, indicating hallucination.

Conclusion

This approach leverages specific prompt patterns and a retry mechanism to effectively mitigate hallucinations in OpenAI Whisper transcriptions. By using these techniques, you can significantly improve the reliability of transcriptions, especially in recordings with periods of silence or noise.

Topic		Replies	Views
Whisper prompt leads to hallucinations API whisper	0	1817	April 23, 2023
Hallucination on audio with no speech API whisper	7	6254	December 25, 2023
Whisper API hallucinating on empty sections API whisper	7	3977	August 15, 2024
How to avoid Hallucinations in Whisper transcriptions? API whisper	31	18063	September 25, 2024
Whisper hallucination - how to recognize and solve? API whisper	25	15367	July 15, 2024