Mistral Debuts Open-Source Voxtral for AI Speech Translation and Transcription

Voxtral Small and Mini can process audio files up to 40 minutes long, while Voxtral Mini Transcribe supports files up to 30 minutes — a capability that surpasses most open and closed models, making them suitable for multilingual meeting transcription, voice-based chat interfaces, and long-form audio summarization.

“State-of-the-Art” Results, Mistral Says

During testing, Voxtral Small and Voxtral Mini Transcribe achieved state-of-the-art transcription results, beating OpenAI’s Whisper and GPT-4o mini Transcribe as well as Google’s Gemini 2.5 Flash, and performing competitively with ElevenLabs Scribe.

Slator 2025 AI Dubbing Report

The 85-page report analyzes the supply and demand for AI dubbing and the technical and operational nuances in delivering AI dubbing across verticals.

$690 BUY NOW Included in our Pro and Enterprise plan.
Subscribe now!

In speech translation, Voxtral Small outperformed Gemini 2.5 Flash and GPT-4o-mini Audio across several language pairs, including English↔French, Spanish↔English, and German↔English.

Mistral has released model weights for Voxtral Small and Voxtral Mini on Hugging Face. Users can try Voxtral for free by downloading the API on Hugging Face or testing the models in Mistral’s chatbot Le Chat. A live demo is also available.

The company has also released three new speech understanding benchmarks — synthesized speech evaluations generated from popular text evaluation datasets. “We are releasing the synthesized evaluations under a permissive license and encourage their adoption as standard benchmarks for speech understanding,” the team said.

Featured