The paper, Adaptive Machine Translation with Large Language Models, also explored a less common method of adaptive MT: learning from similar translations (i.e., fuzzy matches) found in approved translation memories. In particular, researchers were interested in real-time adaptation.
“Instead of asking the model to translate a sentence or providing random examples, it turns out that showing the model 1-10 domain-specific translation pairs similar to the sentence to be translated can improve the translation of the new sentence immediately,” Moslem wrote in a January 31, 2023 LinkedIn post, adding that this method is “especially useful for high-resource languages.”
For low-resource languages, the team was able to use fuzzy matches to improve translations from “stronger” MT systems, such as those from DeepL and Google.
Researchers extracted sentence pairs similar to each segment in a test dataset of 3,070 segments covering five language combinations (English into Arabic, Chinese, French, Kinyarwanda, and Spanish).
They found that few-shot translation with GPT-3 using fuzzy matches resulted in the highest-quality translations — to a point.
“At some point, there might be diminishing returns of adding more similar sentences,” the authors conceded. “Increasing the number of fuzzy matches from two sentences to three, four, five, and 10 sentences incrementally improves translation quality, but with smaller quality gains.”
Putting Results into Context
For certain language pairs, GPT-3’s adaptive MT with fuzzy matches was able to outperform more standard encoder-decoder, Transformer-based MT models: for English into French and Spanish, just five fuzzy matches were needed; for Chinese, at least 10.
Slator Machine Translation Expert-in-the-Loop Report
60-page report on the interaction between human experts and AI in translation production, including AI-enabled workflows, adoption rates, postediting, pricing models.
Results for Arabic and Kinyarwanda were not on par with those for other language pairs, since GPT-3.5 mostly supports high-resource Latin-based languages.
Authors attributed the disparity to limited support (Kinyarwanda) and issues with GPT-3’s tokenizer (Arabic).
Overall, though, researchers described the results as “very promising,” and noted that users might adopt one of several “pipelines” in production based on the level of support an LLM offers for a given language pair.