While Microsoft has long supported synthetic speech, the new feature takes a step further: it shifts the process from text-to-speech to speech-to-speech.
‘Consistent Experience’ in Multilingual Dubbing
Among Microsoft’s key use cases is multilingual dubbing. Localized audio content often varies in voice quality and style across languages. With voice conversion, Microsoft offers a potential solution by enabling the conversion of all dubbed audio into a single, consistent target voice, “ensuring a consistent experience across all languages.”
Slator 2025 Language Industry Market Report
The 150-page report offers a comprehensive view of the 2025 global market — with market sizing, AI capability breakdowns, buyer insights, use cases, survey data, and projections through 2030.
Microsoft says its system outperformed a leading competitor in internal tests, especially in Mandarin, where it delivered clearer and more natural-sounding speech. Performance in English was on par.
Voice conversion is also being added to Microsoft’s Custom Voice offering, now in private preview. This allows companies to apply voice conversion to their own branded synthetic voices, preserving the tone and emotion of the original audio while using a familiar voice identity. It requires only a small amount of training data, making it a “quick solution for dynamic voice customization,” according to Microsoft.
Microsoft has published implementation details and technical guidance for users interested in exploring the feature.