Running Whisper on AWS GPU - Memory Error

I am running Whisper on AWS EC2 g3s.xlarge. I have a bunch of long (~1 hour) audio files and want to use the Whisper Medium model to transcribe them. My code works fine for the first file and then it crashes with following error message:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 7.43 GiB total capacity; 6.72 GiB already allocated; 15.44 MiB free; 6.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Does anyone know how can I handle this?

Use the API and ditch the EC2 instance!

This is how I am using it now:

model = whisper.load_model(“medium”)
whisper_results = model.transcribe(audio_filepaths[i], fp16=False)

Is there a difference between using the above code and using the code from the new Whisper API?

Also, I see that larger than 25mb files need to be broken in order to use the new API.

The API version of Whisper uses the Large model, and yes you have to break up files > 25 Megs.