I’m working on a FastAPI app for transcribing videos from different platforms. I need a speech-to-text model that:
- Supports multiple languages
- Works well on CPU (no GPU available)
- Has fast transcription speed
I’ve tried Whisper (base and tiny), but I’m looking for suggestions from this community for the best CPU-friendly, multilingual model — either OpenAI’s or open-source.
Any suggestions or benchmarks would really help. Thanks!