When using a silent mp3 file

hirasaki1985 · October 25, 2023, 1:14am

Hi team.

I just have a problem.

I use whisper with typescript like this.

import { Configuration, OpenAIApi } from 'openai';

const maxMp3FileSize = 25 * 1024 * 1024;

const configuration = new Configuration({
  organization: organization,
  apiKey: apiKey,
  baseOptions: {
    maxBodyLength: maxMp3FileSize,
  },
});
const openai = new OpenAIApi(configuration);

const filepath = "path to mp3 file."

  const results = await openai.createTranscription(
    fs.createReadStream(filepath) as any,
    'whisper-1',
    undefined,
    'vtt',
    undefined,
    'ja',
    {
      maxBodyLength: maxMp3FileSize,
    }
  );

but result is

{
  data: '"WEBVTT\\n\\n00:00:00.000 --> 00:00:20.000\\nご視聴ありがとうございました\\n\\n00:00:30.000 --> 00:00:40.000\\nご視聴ありがとうございました\\n\\n"'
}

The mp3 file is silent, I don’t say anything, but response has some text.
I want to recieve empty text, or don’t catch mistaken texts.
How do I solve this problem?
Please tell me it.

openai version is here.

    "openai": "^3.3.0",

on the github discussion link
→ @github/openai/whisper/discussions/1731

_j · October 25, 2023, 1:43am

Here’s a code dump to strip silences out of a file:

gist.github.com

https://gist.github.com/smashah/fb7bd9a57dd2181d4142886888f99b92

remove_silence.ts

//add this to your package.json

// "audio-buffer-utils": "^5.1.2",
// "audio-decode": "^1.4.0",
// "audiobuffer-to-wav": "^1.0.0",
// "node-lame": "^1.2.0",
// "ogg.js": "^0.1.0",
// "opus.js": "^0.1.1",

const fs = require('fs');

This file has been truncated. show original

You can remove references to other codecs you wouldn’t be using.

However your file is, with the exception of a glitch around 20 seconds in, pure 0 bit audio (but you’d still need to decode mp3 to discover). You’ll likely want to ensure that there’s even enough buffer worth sending to whisper by checking the length after silence removal.

Then if you want to ensure that translation is done well and silence isn’t hallucinated on, you can append every chunk of audio to a few-second introduction audio that sets the level and trains the language. A phrase so reliable it can be stripped by regex or even the end of the sentence.

hirasaki1985 · October 25, 2023, 9:47am

Thank you so much!
I’ll try it!

Topic		Replies	Views
How can I make Whisper return empty string if no one spoke? API	1	1939	November 24, 2023
Whisper doenst detect silence? API	1	381	December 15, 2024
Whispher API gives random response on detect silence and noise in mp3 file API whisper	0	233	May 13, 2024
Hallucination on audio with no speech API whisper	7	7339	December 25, 2023
Whisper API, increase file limit >25 MB API whisper , feature-request	29	15137	June 19, 2024

When using a silent mp3 file

Related topics