Microsoft Says This New Voice Conversion Feature Will Improve AI Dubbing

While Microsoft has long supported synthetic speech, the new feature takes a step further: it shifts the process from text-to-speech to speech-to-speech.

‘Consistent Experience’ in Multilingual Dubbing

Among Microsoft’s key use cases is multilingual dubbing. Localized audio content often varies in voice quality and style across languages. With voice conversion, Microsoft offers a potential solution by enabling the conversion of all dubbed audio into a single, consistent target voice, “ensuring a consistent experience across all languages.”

MAIN IMAGE - 2025 Language Industry Market Report

Slator 2025 Language Industry Market Report

The 150-page report offers a comprehensive view of the 2025 global market — with market sizing, AI capability breakdowns, buyer insights, use cases, survey data, and projections through 2030.

$970 BUY NOW Included in our Growth, Pro, and
Enterprise plans. Subscribe now!

Microsoft says its system outperformed a leading competitor in internal tests, especially in Mandarin, where it delivered clearer and more natural-sounding speech. Performance in English was on par.

Voice conversion is also being added to Microsoft’s Custom Voice offering, now in private preview. This allows companies to apply voice conversion to their own branded synthetic voices, preserving the tone and emotion of the original audio while using a familiar voice identity. It requires only a small amount of training data, making it a “quick solution for dynamic voice customization,” according to Microsoft.

Microsoft has published implementation details and technical guidance for users interested in exploring the feature.

Featured