Next, a specialized input prompt is created for the LLM. This prompt includes the transcription, the translation, and additional retrieved examples that may demonstrate how similar tasks have been performed or refined in the past, thereby providing context or guidance to the model.
With the constructed prompt, the LLM is asked to produce an output that includes a refined transcription — which is an improved version of the initial transcription that may correct errors or enhance clarity — and a refined translation, which is a modified version of the translation that improves its fluency, accuracy, and coherence.
Although this additional refinement step introduces some latency, the researchers highlight that the substantial quality improvements in speech translation justify the extra processing time.
2024 Slator Pro Guide: Translation AI
The 2024 Slator Pro Guide presents 20 new and impactful ways that LLMs can be used to enhance translation workflows.
Joint Refinement Wins
The researchers tested their approach on the MuST-C and CoVoST 2 datasets across multiple translation directions (English > German, French, Spanish, Catalan, Arabic, and Turkish).
They found that the refinement process improves translation accuracy by addressing errors in both transcriptions and translations. “By leveraging information from both the automatic transcription and translation, our approach significantly enhances the overall quality of speech translation,” the researchers said. They also highlighted that refining transcription and translation together consistently outperforms refining translation alone.
Looking ahead, the researchers plan to explore how using spoken language directly as input can further improve the refinement process of speech translation. Instead of just relying on written text to make translations better, they believe that analyzing the actual speech itself could help identify and correct errors more effectively.
The code and datasets are available on GitHub.
Authors: Huaixia Dou, Xinyu Tian, Xinglin Lyu, Jie Zhu, Junhui Li, Lifan Guo