To that end, they conducted experiments to assess how MT and HT differ in terms of morphosyntactic divergence, understand the source of this difference, and explore how translation divergences in HT affect MT quality. The experiments were conducted in three language pairs: English-French, English-German, and English-Chinese using WMT datasets.
Conservative Machine Translation
The results revealed that MT is more “conservative” than HT, exhibiting less morphosyntactic diversity, more convergent patterns, and more one-to-one alignments. They also observed that MT tends to be less similar to HT when the source has less common structures.
The authors attributed this discrepancy to the use of beam search, which biases MT towards more convergent patterns. This bias is most prominent when convergent patterns appear frequently — around 50% of the time — in the training data. “This could be because the model has seen the pattern enough to assign it substantial probability mass, but there is still enough uncertainty that humans will frequently choose other patterns,” said the authors.
Moreover, frequencies of convergent patterns in MT increase even when they are uncommon in HT, suggesting perhaps a more inherent structural bias in current MT architectures.
Lastly, they investigated how the presence of morphosyntactic divergence in HT might affect MT quality and found that, for a majority of morphosyntactic divergences, their presence in HT is correlated with decreased MT performance, presenting a greater challenge for MT systems.
The authors emphasized that “this is the first work to present the comparative perspective of HT vs MT in such fine granularity covering thousands of morphosyntactic constructions,” and expressed their interest in applying the same analysis to large language model (LLM)-based MT systems and see if and how the LLM translations differ from those produced by traditional MT models.