It means you need to encode it to a voice format that doesn’t waste so much data.
Whisper is open source, meaning that it can be used or recoded by anyone, and is light enough to be run on a 4GB GPU or slowly on CPU.
A person could, for example, use cloud Google Tensor processor ASICs and transcribe 50x faster than OpenAI can.
https://huggingface.co/spaces/sanchit-gandhi/whisper-jax, or make an API for it.
Other varieties you can run on your own good hardware can offer by-the-word timestamps or be oriented towards video transcriptions.
OpenAI API runs whisper-v2-large, but could be v3-upgraded without you knowing, as the newly released model is the same size.