URL: OpenAI Platform
Audio sample provided is 50MB the API limit (25MB) There is no mention of this or the need to convert the audio format in order to reduce size in the documentation
Thanks, keep up the good work
JT
1 Like
Please see this section of the documentation :
_j
3
Whisper paper:
- Long-form Transcription
Whisper models are trained on 30-second audio chunks and
cannot consume longer audio inputs at once. This is not a
problem with most academic datasets comprised of short
utterances but presents challenges in real-world applications
which often require transcribing minutes- or hours-long au-
dio. We developed a strategy to perform buffered transcrip-
tion of long audio by consecutively transcribing 30-second
segments of audio and shifting the window according to the
timestamps predicted by the model. We observed that it
is crucial to have beam search and temperature scheduling
based on the repetitiveness and the log probability of the
model predictions in order to reliably transcribe long audio.
The full procedure is described in Section 4.5
If you can chunk your own audio based on silence detection, you will likely get better performance than sending big audio to the API.
1 Like
Thanks all, I know/read what you are linking to, this is feedback for the creator of the official tutorial. I am highlighting the fact that the provided sample will provide unnecessary friction for new users.
3 Likes
terps
6
Agreed I think mentioning the need for chunking will make the tutorial more seamless , here was my fix:
import openai
from pydub import AudioSegment
from docx import Document
# Function to split the audio file into chunks
def split_audio(audio_file_path, chunk_length_ms=30000): # chunk_length_ms is 30 seconds by default
audio = AudioSegment.from_file(audio_file_path)
chunks = []
for i in range(0, len(audio), chunk_length_ms):
chunks.append(audio[i:i+chunk_length_ms])
return chunks
# Function to transcribe audio
def transcribe_audio_chunks(chunks):
transcription = ""
for i, chunk in enumerate(chunks):
# Export chunk to a temporary file
chunk_file = f'audio/chunk{i}.wav'
chunk.export(chunk_file, format="wav")
with open(chunk_file, 'rb') as audio_file:
response = openai.Audio.transcribe("whisper-1", audio_file)
transcription += response['text'] + " " # Add a space between chunks to separate words
return transcription
# Main function to transcribe an audio file
def transcribe_audio(audio_file_path):
chunks = split_audio(audio_file_path)
transcription = transcribe_audio_chunks(chunks)
return transcription
adjust chunk_file path as needed