Beam search, in particular, is very effective at maximizing BLEU scores, “but there is a significant cost to be paid in naturalness and diversity,” the researchers wrote. In practice, this means that MT models typically offer no variability in translations, leading to less engaging output. The researchers also suggested that readers who encounter a given language primarily through these more monotonous translations “might develop a warped exposure to that language.”
Slator 2021 Data-for-AI Market Report
44-pages on how LSPs enter and scale in AI Data-as-a-service. Market overview, AI use cases, platforms, case studies, sales insights.
Gender pronouns were just one of a number of diversity diagnostics the team introduced in their experiments, but researchers found that even when translating between two gendered languages, search disproportionately chose the more frequent gender, based on the input.
For English to German translations, researchers noted that since the German word “sie” translates as “she,” “they,” or “you” in English, the result was a bias toward the more common gender pronoun, “sie.” By contrast, when translating from French or German to English, male pronouns were more represented in the training set, and the bias skewed male accordingly.
“The singular focus on improving BLEU leaves no incentive to address issues of diversity”
A possible alternative to search might be sampling, which has lower rates of replacing “she” and “her” with male pronouns compared to search. However, the authors warned, the field might not be ready to shift away from search just yet, since sampling does not yield the same consistently high BLEU scores that search does.
“The singular focus on improving BLEU leaves no incentive to address issues of diversity,” they wrote. The researchers’ own future work will explore techniques that can achieve high BLEU scores while producing natural-sounding translations.