Feedback: Whisper, "Meeting-minutes" tutorial audio sample issue

tregoning · October 22, 2023, 9:03pm

Audio sample provided is 50MB the API limit (25MB) There is no mention of this or the need to convert the audio format in order to reduce size in the documentation

Thanks, keep up the good work

JT

Foxalabs · October 22, 2023, 9:50pm

Please see this section of the documentation :

_j · October 22, 2023, 10:17pm

Whisper paper:

Long-form Transcription
Whisper models are trained on 30-second audio chunks and
cannot consume longer audio inputs at once. This is not a
problem with most academic datasets comprised of short
utterances but presents challenges in real-world applications
which often require transcribing minutes- or hours-long au-
dio. We developed a strategy to perform buffered transcrip-
tion of long audio by consecutively transcribing 30-second
segments of audio and shifting the window according to the
timestamps predicted by the model. We observed that it
is crucial to have beam search and temperature scheduling
based on the repetitiveness and the log probability of the
model predictions in order to reliably transcribe long audio.
The full procedure is described in Section 4.5

If you can chunk your own audio based on silence detection, you will likely get better performance than sending big audio to the API.

tregoning · October 23, 2023, 7:38am

Thanks all, I know/read what you are linking to, this is feedback for the creator of the official tutorial. I am highlighting the fact that the provided sample will provide unnecessary friction for new users.

AndrewV · November 7, 2023, 1:22am

It’s there now!

terps · November 7, 2023, 8:16pm

Agreed I think mentioning the need for chunking will make the tutorial more seamless , here was my fix:

import openai
from pydub import AudioSegment
from docx import Document

# Function to split the audio file into chunks
def split_audio(audio_file_path, chunk_length_ms=30000): # chunk_length_ms is 30 seconds by default
    audio = AudioSegment.from_file(audio_file_path)
    chunks = []
    
    for i in range(0, len(audio), chunk_length_ms):
        chunks.append(audio[i:i+chunk_length_ms])
    return chunks

# Function to transcribe audio
def transcribe_audio_chunks(chunks):
    transcription = ""
    for i, chunk in enumerate(chunks):
        # Export chunk to a temporary file
        chunk_file = f'audio/chunk{i}.wav'
        chunk.export(chunk_file, format="wav")
        
        with open(chunk_file, 'rb') as audio_file:
            response = openai.Audio.transcribe("whisper-1", audio_file)
            transcription += response['text'] + " "  # Add a space between chunks to separate words
    return transcription

# Main function to transcribe an audio file
def transcribe_audio(audio_file_path):
    chunks = split_audio(audio_file_path)
    transcription = transcribe_audio_chunks(chunks)
    return transcription

adjust chunk_file path as needed

Topic		Replies	Views
Whisper API server error for long (not big) files API whisper	7	3199	December 18, 2023
Whisper API, increase file limit >25 MB API whisper , feature-request	29	9992	June 19, 2024
Sending an hours worth of audio through Whisper using node.js API api	11	4627	December 11, 2023
How to transcribe long audio to srt file directly? API whisper	3	3506	December 16, 2023
How do I get whisper to allow larger files in the request? Bugs whisper	2	3148	December 26, 2023

Feedback: Whisper, "Meeting-minutes" tutorial audio sample issue

Related Topics