NVIDIA, Microsoft, ElevenLabs Top New Automatic Speech Recognition Leaderboard

Hugging Face has teamed up with NVIDIA, Mistral AI, and the University of Cambridge to launch the Open ASR Leaderboard, a public benchmark for automatic speech recognition (ASR).

The researchers noted that while the ASR space is growing fast, with new models and players entering the market, comparing systems remains a challenge because of inconsistent datasets and evaluation methods.

The interactive leaderboard compares more than 60 speech recognition models — both open-source and proprietary — across 11 datasets in English, German, French, Italian, Spanish, and Portuguese. 

It also introduces a dedicated long-form transcription track, reflecting real-world use cases such as meetings and podcasts. “A separate long-form evaluation is necessary because some models employ chunking strategies to reduce inference time, which can in turn affect transcription quality,” the researchers explained.

Who’s Leading the Pack

NVIDIA’s NeMo Canary Qwen 2.5b tops the English leaderboard with a 5.63% word error rate (WER), followed by IBM’s Granite Speech 3.3, Microsoft’s Phi-4 Multimodal Instruct, and NVIDIA’s Parakeet.

In multilingual transcription, Microsoft’s Phi-4 Multimodal Instruct and NVIDIA’s Canary 1B v2 perform the strongest, with average WERs between 3–5% across European languages. Yet the data reveals a familiar trade-off: models optimized for English tend to lose generalization, while multilingual systems slightly trail in English accuracy.

For long-form transcription, ElevenLabs leads with the most accurate results, while RevAI and Speechmatics follow closely. Among open-source models, OpenAI’s Whisper Large v3 ranks highest, with distilled versions offering faster inference.

Open Collaboration

The entire leaderboard infrastructure is open for contributions. Developers can submit new models or datasets through GitHub pull requests, and results update automatically on the Hugging Face Hub.

Authors: Vaibhav Srivastav, Steven Zheng, Eric Bezzam, Eustache Le Bihan, Nithin Koluguri, Piotr Zelasko, Somshubra Majumdar, Adel Moumen, and Sanchit Gandhi