Alibaba Outlines How LLMs Can Improve Speech-to-Text AI Translation

Next, a specialized input prompt is created for the LLM. This prompt includes the transcription, the translation, and additional retrieved examples that may demonstrate how similar tasks have been performed or refined in the past, thereby providing context or guidance to the model.

With the constructed prompt, the LLM is asked to produce an output that includes a refined transcription — which is an improved version of the initial transcription that may correct errors or enhance clarity — and a refined translation, which is a modified version of the translation that improves its fluency, accuracy, and coherence.

Although this additional refinement step introduces some latency, the researchers highlight that the substantial quality improvements in speech translation justify the extra processing time.

2024 Cover Slator Pro Guide Translation AI

2024 Slator Pro Guide: Translation AI

The 2024 Slator Pro Guide presents 20 new and impactful ways that LLMs can be used to enhance translation workflows.

$365 BUY NOW Included in our Pro and Enterprise plan.
Subscribe now!

The researchers tested their approach on the MuST-C and CoVoST 2 datasets across multiple translation directions (English > German, French, Spanish, Catalan, Arabic, and Turkish).

They found that the refinement process improves translation accuracy by addressing errors in both transcriptions and translations. “By leveraging information from both the automatic transcription and translation, our approach significantly enhances the overall quality of speech translation,” the researchers said. They also highlighted that refining transcription and translation together consistently outperforms refining translation alone.

Looking ahead, the researchers plan to explore how using spoken language directly as input can further improve the refinement process of speech translation. Instead of just relying on written text to make translations better, they believe that analyzing the actual speech itself could help identify and correct errors more effectively.

The code and datasets are available on GitHub.

Authors: Huaixia Dou, Xinyu Tian, Xinglin Lyu, Jie Zhu, Junhui Li, Lifan Guo

Featured

Partner spotlight

Boost Language Access

Improve health outcomes and ensure compliance for individuals with LEP

Watch the webinar

Partner spotlight

memoQ Translation Tech

Enterprise-Grade, AI-Powered and Secure Localization Management for Teams

Discover memoQ

Partner spotlight

Leading with Excellence

globalese by memoQ | 2025 CODiE Award winner for Best Machine Translation.

Partner spotlight

AI should speak every language

Support linguists building tools that serve marginalized communities.

Donate now

Alibaba Outlines How LLMs Can Improve Speech-to-Text AI Translation

2024 Slator Pro Guide: Translation AI

Joint Refinement Wins

Featured

Boost Language Access

memoQ Translation Tech

Leading with Excellence

AI should speak every language