How can I make Whisper return empty string if no one spoke?

Hello,

I am currently adding voice recording to my application. I then send it to the createTranscription API and I receive the text back.

It all works fine with one exception. When I don’t speak, I would expect it to return an empty string, instead I get the most random pieces of text as a response. Some examples:

“Radio ondertiteld door de Amara gemeenschap” (radio subtitled by Amara)

“Zo zullen we de binnenkant nog uitspreken.” (so we will pronounce the inside)

These are just so random lol. I am however Dutch so maybe there is some sort of interference from my PC when it records? I tried adding a prompt to just return an empty string, but the behaviour stays:

async function transcribeSpeech(req, res) {
  try {
    const { file } = req;
    const { language } = req.body;

    if (!file) {
      return res.status(400).json({ message: "No file uploaded" });
    }

    const configuration = new Configuration({
      organization: "org-qLyCCwhzH22H7KuqikplNsgg",
      apiKey: process.env.OPENAI_API_KEY,
    });

    const openai = new OpenAIApi(configuration);

    const audioReadStream = Readable.from(file.buffer);
    audioReadStream.path = "speech.wav";

    const result = await openai.createTranscription(
      audioReadStream,
      "whisper-1",
      'if there is no speech, just return nothing',
      undefined,
      undefined,
      language
    );

    console.log('result', result);

    return res.status(200).json({ transcription: result.data.text });
  } catch (error) {
    sendErr(res, error, "An error occured while transcribing the speech");
  }
}

That’s an issue with Whisper itself, and you’ll need to implement a fix yourself. There’s no fix that’s 100%, but you can try and filter it out through a GPT3.5 step or simiar.

I’ve talked about this a bit here: Reading videos with GPT4V - #4 by Fusseldieb (Scroll a bit down and you’ll see it)

1 Like