Best multilingual speech to text models for CPU

I’m working on a FastAPI app for transcribing videos from different platforms. I need a speech-to-text model that:

  • Supports multiple languages
  • Works well on CPU (no GPU available)
  • Has fast transcription speed

I’ve tried Whisper (base and tiny), but I’m looking for suggestions from this community for the best CPU-friendly, multilingual model — either OpenAI’s or open-source.

Any suggestions or benchmarks would really help. Thanks!