I wanted to share that I’ve just open sourced a family of streaming models and a library to run them. Our largest model is 245 million parameters and achieves a 6.65% word error rate on the HuggingFace OpenASR Leaderboard, compared to 7.44% and 1.5 billion parameters for Whisper Large v3. Here’s some more information, I’d love to hear if there’s interest in this direction from the you all?
I can’t include links unfortunately, probably for good spam protection reasons, but if you go to my blog at petewarden dot com, you can check out the top post. ![]()
Thanks,
Pete