OpenAI hallucinating to an empty input audio file

leon.viktoriia · April 3, 2024, 12:02am

Hi. We are building an app that helps practice speaking a foreign language. Due to certain limitations it is currently set as follows: AI starts the conversation with a question, a user has to tap on a button to speak, the voice is recorded, the user taps to sand, and the message is transcribed. Sometimes when a user taps to send the message but hasn’t said anything basically an empty message/audio file is sent to AI. The problem that we are facing is that openai hallucinates a response for both a user and its own reply to the empty audio file. Do you have any ideas what we could do to resolve it? Would be very grateful.

supershaneski · April 3, 2024, 6:37am

Yes, it is a known problem. Your best bet is to preprocess your audio data prior to sending it for transcription by removing the silent parts. You can use ffmpeg for this. The disadvantage of doing this is if you need the timestamps and you need ffmpeg installed in the server.

anon22939549 · April 3, 2024, 6:39am

This is a known quirk of the model with respect to attempting to process files with no dialog.

You could always just check the maximum volume of the audio file and not process it if it is below a certain threshold.

Topic		Replies	Views
Hallucination on audio with no speech API whisper	7	8261	December 25, 2023
How can I make Whisper return empty string if no one spoke? API	1	2077	November 24, 2023
When using a silent mp3 file API whisper	2	731	October 25, 2023
Dialog before long pause gets repeated over and over again by Whisper API whisper	3	2318	November 6, 2023
Whisper transcription failures and hallucinations API	4	822	April 5, 2024

OpenAI hallucinating to an empty input audio file

Related topics