A new research paper by Amazon explores the topic of quality estimation (QE) in translated subtitles. The paper, entitled “DeepSubQE: Quality estimation for subtitle translations,” presents a new system for estimating the quality of translated subtitles, whether human- or machine-generated. The DeepSubQE system reduces both cost and time in subtitle translation while assuring quality, the researchers said.
“Low translation quality can cause increased usage drop-off and hurt content viewership for audience of target language” — Prabhakar Gupta and Anil Nelakanti, Amazon Prime Video, International Expansion
Subtitling and dubbing are indeed topical for Amazon; As the researchers explained, the “digital entertainment industry is growing multifold with ease of internet access and numerous options for on-demand streaming platforms.” And, although a separate paper published by Amazon in January 2020 looked into machine dubbing, Gupta and Nelakanti pointed out that “translation of subtitles across languages is a preferred cost effective industry practice to maximize content reach.”
Subtitling may be a more cost-efficient approach than dubbing, but the researchers also noted that using human translators to localize subtitles is expensive, and the “man-power cost” increases significantly in the case of low-resource target languages. Since a second person normally checks the translated subtitles to improve the quality of the translation where needed, quality evaluation is “as expensive as generating the translation itself,” they said.
Slator 2021 Language Industry Market Report
80-pages. Market Size by Vertical, Geo, Intention. Expert-in-Loop Model. M&A. Frontier Tech. Hybrid Future. Outlook 2021-2025.
Exploring an automated approach for estimating the quality of translated subtitles is therefore logical, and all the more so for Amazon, who is among the biggest players in the on-demand streaming space.
Good, Bad or Loose
Clearly, quality estimation for any kind of translation is challenging because there is more than one way to translate a given sentence into a specific target language. Legitimate translation techniques like paraphrasing, rephrasing and idiom, which are frequently used in subtitling translation, complicate the matter further since they confound binary methods of quality estimation.
Most QE methods are binary methods, however, and simply classify translations as “good” or “bad.” They fail to deal with ““loosely” translated samples that often occur due to human judgment.”
Under Amazon’s DeepSubQE model, which introduces a “loose” translation category, a good translation is one that retains “all meaning from [the] source and reads fluently;” A bad translation is one that bears no resemblance to the meaning of the original and is “disconnected from the context in the video;” A loose translation is one that uses paraphrasing, colloquialism, idiom, or “contains some contextual information not available in [the] source text.”
The researchers also noted how categorizing subtitle translations into the three categories helps to indicate what further work the translations require, if any. Good translations are fine as is and need no further improvement, while loose translations may require human post-edits and bad translations need a “complete rewrite,” they said.
The researchers worked with 30,000 video subtitle files in English and their corresponding translations in French, German, Italian, Portuguese and Spanish for their experiments. They found that the DeepSubQE model was accurate in its estimations more than 91% of the time for all five languages. The system performed slightly better for longer sentences.
One area of improvement they noted for DeepSubQE was related to the operational load. Currently, the system relies on training one model per language pair, which requires “considerable operational load,” the researchers said. However, future work could explore using a multilingual model, which would help to reduce the load and also be of benefit to “resource starved languages,” they added.