Machine Translation Is More ‘Conservative’ Than Human Translation, Google Says

In a paper published on January 2, 2024, Google researchers Jiaming Luo, Colin Cherry, and George Foster compared morphosyntactic divergence in machine translation (MT) against human translation (HT) and found that MT tends to be more “conservative” than HT.

The authors explained that translation divergences occur when translations differ structurally from source sentences. This can be due to inherent cross-lingual differences or idiosyncratic preferences of translators. These divergences naturally occur in the translation process and can be readily found in human translations — including those used for training MT systems — they said.

They also highlighted that the existence of these divergences in HT has long been regarded as a key challenge for MT, and recent studies have demonstrated the abundance of translation divergences in HT.

To that end, they conducted experiments to assess how MT and HT differ in terms of morphosyntactic divergence, understand the source of this difference, and explore how translation divergences in HT affect MT quality. The experiments were conducted in three language pairs: English-French, English-German, and English-Chinese using WMT datasets.

Conservative Machine Translation

The results revealed that MT is more “conservative” than HT, exhibiting less morphosyntactic diversity, more convergent patterns, and more one-to-one alignments. They also observed that MT tends to be less similar to HT when the source has less common structures.

The authors attributed this discrepancy to the use of beam search, which biases MT towards more convergent patterns. This bias is most prominent when convergent patterns appear frequently — around 50% of the time — in the training data. “This could be because the model has seen the pattern enough to assign it substantial probability mass, but there is still enough uncertainty that humans will frequently choose other patterns,” said the authors.

Moreover, frequencies of convergent patterns in MT increase even when they are uncommon in HT, suggesting perhaps a more inherent structural bias in current MT architectures.

Lastly, they investigated how the presence of morphosyntactic divergence in HT might affect MT quality and found that, for a majority of morphosyntactic divergences, their presence in HT is correlated with decreased MT performance, presenting a greater challenge for MT systems.

The authors emphasized that “this is the first work to present the comparative perspective of HT vs MT in such fine granularity covering thousands of morphosyntactic constructions,” and expressed their interest in applying the same analysis to large language model (LLM)-based MT systems and see if and how the LLM translations differ from those produced by traditional MT models.

Featured