Few-Shot Translation
In the paper, the Google researchers pointed out that previous works have explored region-aware MT. However, these works assume the availability of large-scale datasets containing examples with the target varieties explicitly labeled, which in many cases are unavailable or expensive to create.
In light of this data scarcity, they proposed FRMT as a benchmark for few-shot translation, measuring an MT model’s ability to translate into regional varieties when given only up to 100 labeled examples of each language variety. MT models need to learn from a small number of labeled examples to identify similar patterns in their unlabeled training examples. This enables them to generalize and produce correct translations of phenomena not explicitly shown in the examples, they explained. “Few-shot approaches to MT are attractive because they make it much easier to add support for additional regional varieties to an existing system,” they said.
Slator Machine Translation Expert-in-the-Loop Report
60-page report on the interaction between human experts and AI in translation production, including AI-enabled workflows, adoption rates, postediting, pricing models.
Dataset Creation
The dataset covers two regions each for Portuguese (Brazil and Portugal) and Mandarin (Mainland and Taiwan), and was created by sampling English sentences from Wikipedia and acquiring professional human translations in the target regional varieties. Final quality verification was done through manual evaluation by an independent set of translators, using the Multidimensional Quality Metrics (MQM) framework.
As the researchers explained, these languages and varieties were selected because they have many speakers who can benefit from increased regional support in MT and they are linguistically very distinct, coming from different families. The researchers hypothesized that “methods that perform well on both are more likely to generalize well to other languages.” They added, “In principle, those methods should also work for other language distinctions, such as formality and style.”
“With the release of the FRMT data and accompanying evaluation code, we hope to inspire and enable the research community to discover new ways of creating MT systems that are applicable to the large number of regional language varieties spoken worldwide,” the authors said.
PaLM Excels in Region-Aware MT
The evaluation covered a handful of recent models capable of few-shot control. Based on human evaluation with MQM, the baseline methods all showed some ability to localize their output for Portuguese. However, for Mandarin, they mostly failed to use knowledge of the targeted region to produce superior Mainland or Taiwan translations.
Comparing across models, Google’s language model, PaLM, performed the best consistently across both Portuguese and Mandarin. “This performance is impressive when taking into consideration that PaLM was trained in an unsupervised way,” highlighted the authors.
The results suggest that large language models (LLM) like PaLM “may be particularly adept at memorizing region-specific word choices required for fluent translation.” However, the researchers noted that “there is still a significant performance gap between PaLM and human performance.”
The paper concluded, “In the near future, we hope to see a world where language generation systems, especially MT, can support all speaker communities.” Moreover, the research team said, “We are excited to see how researchers utilize this benchmark in development of new MT models that better support under-represented language varieties and all speaker communities, leading to improved equitability in natural-language technologies.”