How to use whisper to handle long video?

It means you need to encode it to a voice format that doesn’t waste so much data.

https://community.openai.com/t/sending-an-hours-worth-of-audio-through-whisper-using-node-js/450869/8

Whisper is open source, meaning that it can be used or recoded by anyone, and is light enough to be run on a 4GB GPU or slowly on CPU.

A person could, for example, use cloud Google Tensor processor ASICs and transcribe 50x faster than OpenAI can.

https://huggingface.co/spaces/sanchit-gandhi/whisper-jax, or make an API for it.

Other varieties you can run on your own good hardware can offer by-the-word timestamps or be oriented towards video transcriptions.

OpenAI API runs whisper-v2-large, but could be v3-upgraded without you knowing, as the newly released model is the same size.

1 Like