When using a silent mp3 file

Hi team.

I just have a problem.

I use whisper with typescript like this.

import { Configuration, OpenAIApi } from 'openai';

const maxMp3FileSize = 25 * 1024 * 1024;

const configuration = new Configuration({
  organization: organization,
  apiKey: apiKey,
  baseOptions: {
    maxBodyLength: maxMp3FileSize,
  },
});
const openai = new OpenAIApi(configuration);

const filepath = "path to mp3 file."

  const results = await openai.createTranscription(
    fs.createReadStream(filepath) as any,
    'whisper-1',
    undefined,
    'vtt',
    undefined,
    'ja',
    {
      maxBodyLength: maxMp3FileSize,
    }
  );

but result is

{
  data: '"WEBVTT\\n\\n00:00:00.000 --> 00:00:20.000\\nご視聴ありがとうございました\\n\\n00:00:30.000 --> 00:00:40.000\\nご視聴ありがとうございました\\n\\n"'
}

The mp3 file is silent, I don’t say anything, but response has some text.
I want to recieve empty text, or don’t catch mistaken texts.
How do I solve this problem?
Please tell me it.

openai version is here.

    "openai": "^3.3.0",

on the github discussion link
@github/openai/whisper/discussions/1731

Here’s a code dump to strip silences out of a file:

You can remove references to other codecs you wouldn’t be using.

However your file is, with the exception of a glitch around 20 seconds in, pure 0 bit audio (but you’d still need to decode mp3 to discover). You’ll likely want to ensure that there’s even enough buffer worth sending to whisper by checking the length after silence removal.

Then if you want to ensure that translation is done well and silence isn’t hallucinated on, you can append every chunk of audio to a few-second introduction audio that sets the level and trains the language. A phrase so reliable it can be stripped by regex or even the end of the sentence.

1 Like

Thank you so much!
I’ll try it!:bowing_man: