Whisper API, How to upload file that larger than 25mb

I code in typescript, I try to make a simple web app that can convert video/audio to transcripts by open-ai whisper API, the Whisper API only supports files that are less than 25 MB, so I split them into chunks if they’re too large, and then transcribe the audio into text, but have error, heres what error output:

Error in getTranscript: AxiosError: Request failed with status code 400 Error: Cannot set headers after they are sent to the client

Hi!
Are you making two separate calls to the API, each with their own chunk?
Because then you would send the headers for each request which should resolve the issue, if it is just two files.
You need to re-add the transcripts after you returned the results.

Also, you can check this resource from the forum:

2 Likes

Another useful strategy will be to chunk it with overlap.

Figure about 10-seconds–30-seconds of overlap to ensure good coverage.

ETA:* If you’re using Whisper for transcription, a 25 MB MP3 file encoded at 32 kbps is just under two hours in length (about 109.25 minutes).

Whisper is $0.006 / minute, so this theoretical 25MB file would be about $0.66 to process.

Let’s say you’ve got 10-hours of audio, 600-minutes. You could break that up into 6 chunks, but instead of doing 6 100-minute chunks, you’d do like 2 104-minute chunks (first and last) and 4 108-minute chunks. Then you’d have 4 minutes of overlap at each transition.

This would help ensure the model understands the context going into the new content. It also means you don’t need to worry about having cut mid-sentence so much because with 4-minutes of overlap you should have multiple overlapping full sentences to work with in terms of lining things up.

This does add a total of 40-minutes of transcription (about 7%) which increases cost and processing time.

That said, this was just an example. If you went for 1-minute of overlap in this situation you’d be adding a total of 10-minutes instead of 40.

Anyway, I think I’ve made the point I was hoping to.

Just chop up the file with enough overlap so that you get at least a couple of full-sentences which are repeated at the start and end of adjoining files.

Honestly, I’m really surprised OpenAI doesn’t offer this in the API endpoint by default with a parameter for the amount of overlap.

This might be helpful,

import os
import math
import eyed3
import logging
from pydub import AudioSegment

# Set up logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

MAX_SIZE_MB = 25

def calculate_segment_duration_and_num_segments(duration_seconds, overlap_seconds, max_size, bitrate_kbps):
    """Calculate the duration and number of segments for an audio file."""
    seconds_for_max_size = (max_size * 8 * 1024) / bitrate_kbps
    num_segments = max(2, int(duration_seconds / seconds_for_max_size) + 1)
    total_overlap = (num_segments - 1) * overlap_seconds
    actual_playable_duration = (duration_seconds - total_overlap) / num_segments
    return num_segments, actual_playable_duration + overlap_seconds

def construct_file_names(path_to_mp3, num_segments):
    """Construct new file names for the segments of an audio file."""
    directory = os.path.dirname(path_to_mp3)
    base_name = os.path.splitext(os.path.basename(path_to_mp3))[0]
    padding = max(1, int(math.ceil(math.log10(num_segments))))
    new_names = [os.path.join(directory, f"{base_name}_{str(i).zfill(padding)}.mp3") for i in range(1, num_segments + 1)]
    return new_names

def split_mp3(path_to_mp3, overlap_seconds, max_size=MAX_SIZE_MB):
    """Split an mp3 file into segments."""
    if not os.path.exists(path_to_mp3):
        raise ValueError(f"File {path_to_mp3} does not exist.")
    audio_file = eyed3.load(path_to_mp3)
    if audio_file is None:
        raise ValueError(f"File {path_to_mp3} is not a valid mp3 file.")
    duration_seconds = audio_file.info.time_secs
    bitrate_kbps = audio_file.info.bit_rate[1]
    file_size_MB = os.path.getsize(path_to_mp3) / (1024 * 1024)
    if file_size_MB < max_size:
        logging.info("File is less than maximum size, no action taken.")
        return path_to_mp3
    num_segments, segment_duration = calculate_segment_duration_and_num_segments(duration_seconds, overlap_seconds, max_size, bitrate_kbps)
    new_file_names = construct_file_names(path_to_mp3, num_segments)
    original_audio = AudioSegment.from_mp3(path_to_mp3)
    start = 0
    for i in range(num_segments):
        if i == num_segments - 1:
            segment = original_audio[start:]
        else:
            end = start + segment_duration * 1000
            segment = original_audio[start:int(end)]
        segment.export(new_file_names[i], format="mp3")
        start += (segment_duration - overlap_seconds) * 1000
    logging.info(f"Split into {num_segments} sub-files.")
    return new_file_names

I kind of confuse about what you mean for "Because then you would send the headers for each request ", do you mean add header for each chunks?

Yes, that’s pretty much what I meant.
Split the file in two parts. Send the first part to the API, then the second part.