To address this issue, they proposed an approach to recover the “lost” lexical diversity in MT through a tailored re-ranking of translation candidates. Rather than applying a rigid increase in lexical diversity across all texts, their approach adapts the recovery process to align with the diversity of the original work.
Model-Agnostic
The process begins with the MT system generating multiple translation hypotheses for a given source text. A classifier then accesses these hypotheses, estimating the likelihood that each one resembles an original text in the target language. Additionally, each original text is assigned a lexical diversity score that reflects its vocabulary richness, which is factored into the re-ranking process.
2024 Slator Pro Guide: Translation AI
The 2024 Slator Pro Guide presents 20 new and impactful ways that LLMs can be used to enhance translation workflows.
Translation hypotheses are sorted based on their probabilities of being original texts, with the final selection influenced by the original text’s lexical diversity score. This means that a translation hypothesis with a high probability may be bypassed if it does not match the desired lexical richness.
The output is a translation hypothesis that best balances the likelihood of being original with the lexical diversity score, ensuring the translation conveys meaning while reflecting the original’s stylistic richness.
The researchers emphasized that their approach is “model-agnostic.” As long as the MT system can generate multiple translation candidates for a given text, the re-ranking method can be applied to improve the selection of the best translation.
Closer to Human Quality
To evaluate the effectiveness of this approach, the researchers tested it on 31 English-to-Dutch book translations, employing various metrics, including BLEU and COMET scores for translation accuracy, and lexical diversity scores to assess vocabulary richness.
They compared the tailored re-ranking approach to both vanilla MT and human translations and found that the tailored re-ranking method produced translations with lexical diversity closer to that of human translations.
Authors: Esther Ploeger, Huiyuan Lai, Rik van Noord, and Antonio Toral