I am running Whisper on AWS EC2 g3s.xlarge. I have a bunch of long (~1 hour) audio files and want to use the Whisper Medium model to transcribe them. My code works fine for the first file and then it crashes with following error message:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 7.43 GiB total capacity; 6.72 GiB already allocated; 15.44 MiB free; 6.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Does anyone know how can I handle this?