To address these challenges, the researchers combined insights from conventional simultaneous translation models with a “wait-k” policy and incremental decoding to design a mixture policy tailored to LLMs.
The “wait-k” part of the mixture policy instructs the computer on how much to read before starting to translate, while the “incremental decoding” part instructs the computer on how much to translate before reading more.
Simultaneous Fine-Tuning
While this policy enabled LLMs to perform simultaneous translation to some extent, the authors explored further enhancements through Simultaneous Fine-Tuning (SFT). They observed that the initial policy could partially mitigate issues like hallucinations caused by incomplete source context but noted instances where the model produced locally coherent yet incorrect translations.
To mitigate the model’s tendency to attempt completion, they created a dataset of prefix-to-prefix data, which consisted of source sentences truncated to varying lengths. ChatGPT was used to generate corresponding target prefixes, which were then integrated into the multilingual training set.
The evaluation was conducted using nine language pairs from the MUST-C dataset. All language pairs had English as the source language and consisted of TED talk speech data.
The training set contained between 100,000 to 200,000 samples per language pair, along with 2,000 test samples. The total training set size, including the additional 9000 prefix samples, amounted to 1.9 million samples.
Evaluation metrics included BLEU for translation quality and LAAL for latency, measured using the SimulEval toolkit. The Llama2-7B-chat model was used as the LLM.
Slator 2023 Language Industry Market Report
140-page flagship report on market-size, LLM and GPT impact, TMS, AI dubbing, interpreting, game loc, market outlook, and more.
The results showed that the new approach enabled LLMs to achieve their intrinsic offline translation performance during simultaneous decoding. After SFT, the models outperformed dedicated simultaneous translation models while maintaining lower latency. The addition of prefix training led to slight performance improvements in low-latency scenarios.
The authors also expressed their intention to validate this approach across a broader range of LLMs and languages and explore its integration with speech modalities in future work. “In future work, we plan to validate this approach across a wider range of LLMs and languages and explore its integration with speech modalities,” they said.
Authors: Minghan Wang, Jinming Zhao, Thuy-Trang Vu, Fatemeh Shiri, Ehsan Shareghi, Gholamreza Haffari