Whisper transcription failures and hallucinations

Hello. I wrote a simple Audio to text summarization and transcription app using the OpenAI cookbook. It usually works great.

But today I tried creating the meeting minutes for a small audio file (<10mb) and Whisper processed the file and suggested it has no content!a When I tried again, it summarized parts of the file and the rest it ‘hallucinated’ completely new information, totally unrelated to the actual conversation. Have other users faced this with such basic use cases in the Audio to Text modality as well? Any guidance on how to limit likelihood of such ‘catastrophic’ failures. Thanks.

2 Likes

Welcome to the dev forum @shontoron42

whisper-1 can’t summarise. Can you please share the specifics of the APi call that leads to this issue?

1 Like

not calling the API for transcription. that’s a sep function call. here’s the api call.

it’s the whisper model that seems to have been unreliable at least in these two instances – generally works great.

def transcribe_audio(audio_file_path):
    with open(audio_file_path, 'rb') as audio_file:
        transcription = client.audio.transcriptions.create(model="whisper-1", file=audio_file)
    return transcription.text

In the code you shared, the scope of the transcription object is limited within the with block, so make sure to return the transcription from within that block. If you have difficulty with the code, use the boilerplate code from the API reference.

Also, if the language of the audio is known, setting the language parameter greatly improves the accuracy of the transcription.

2 Likes

Thank you very much! much appreciated.

1 Like