Hello. I wrote a simple Audio to text summarization and transcription app using the OpenAI cookbook. It usually works great.
But today I tried creating the meeting minutes for a small audio file (<10mb) and Whisper processed the file and suggested it has no content!a When I tried again, it summarized parts of the file and the rest it ‘hallucinated’ completely new information, totally unrelated to the actual conversation. Have other users faced this with such basic use cases in the Audio to Text modality as well? Any guidance on how to limit likelihood of such ‘catastrophic’ failures. Thanks.
In the code you shared, the scope of the transcription object is limited within the with block, so make sure to return the transcription from within that block. If you have difficulty with the code, use the boilerplate code from the API reference.
Also, if the language of the audio is known, setting the language parameter greatly improves the accuracy of the transcription.