Mistral Debuts Open-Source Voxtral for AI Speech Translation and Transcription

On July 15, 2025, French AI company Mistral released its first family of open-source speech models, Voxtral, designed for transcription, speech translation, summarization, question-answering, and audio understanding in multiple languages. 

The model comes in two sizes: a 24.3 B parameter variant — Voxtral Small — for production-scale applications, and a 4.7B variant — Voxtral Mini — suited for local and edge deployments.

There is also an ultra-light, fast, and low-cost version of the small model, called Voxtral Mini Transcribe, optimized for transcription-only use cases. 

Voxtral Small and Mini can process audio files up to 40 minutes long, while Voxtral Mini Transcribe supports files up to 30 minutes — a capability that surpasses most open and closed models, making them suitable for multilingual meeting transcription, voice-based chat interfaces, and long-form audio summarization.

“State-of-the-Art” Results, Mistral Says

During testing, Voxtral Small and Voxtral Mini Transcribe achieved state-of-the-art transcription results, beating OpenAI’s Whisper and GPT-4o mini Transcribe as well as Google’s Gemini 2.5 Flash, and performing competitively with ElevenLabs Scribe.

In speech translation, Voxtral Small outperformed Gemini 2.5 Flash and GPT-4o-mini Audio across several language pairs, including English↔French, Spanish↔English, and German↔English. 

Mistral has released model weights for Voxtral Small and Voxtral Mini on Hugging Face. Users can try Voxtral for free by downloading the API on Hugging Face or testing the models in Mistral’s chatbot Le Chat. A live demo is also available.

The company has also released three new speech understanding benchmarks — synthesized speech evaluations generated from popular text evaluation datasets. “We are releasing the synthesized evaluations under a permissive license and encourage their adoption as standard benchmarks for speech understanding,” the team said.