All shows were originally recorded in English. Where available, the authors acquired audio and video for the English originals and audio tracks for the Spanish and German dubs. “Much of our analysis relies on a subset of 35.68 hours of content with both Spanish and German dubs,” they said.
The authors highlight how the results of the study “challenge a number of assumptions commonly made in both qualitative literature on human dubbing and machine-learning literature on automatic dubbing.”
Furthermore, the results suggest the importance of two aspects: first, vocal naturalness and translation quality over commonly emphasized isometric (character length) and lip-sync constraints; second, a more qualified view of the significance of isochronic (timing) constraints.
Slator Pro Guide: Finding Growth in Adjacent Services
Pro Guide for LSPs to identify growth opportunities in adjacent services. Profiles 15 adjacent services, analyzing barriers to entry, risks, and synergies.
According to the authors, source-side audio has a substantial influence on human dubs through channels other than the translated word. This indicates the need for research into automatic dubbing (aka machine dubbing or AI dubbing) systems; most notably, they said, on how to preserve speech characteristics and semantic transfer such as emphasis or emotion.
Product, Not Process
The Amazon scientists examined human dubbing not by studying its process, but its product: a large set of actual dubbed dialogues from TV shows. Compared to interviews with dubbers, they noted how their approach had “the particular virtue of capturing tacit knowledge brought to bear in the human dubbing process but difficult to write down or explain.”
They were especially curious about how human dubbers balanced several competing interests: semantic fidelity, natural speech, timing constraints, and (convincing) lip-sync. The following factors were considered:
- Isochrony – Do dubbers respect timing constraints imposed by the video and original audio?
- Isometry – Do the original and dub texts have approximately the same number of characters?
- Speech tempo – How much do voice actors vary their speaking rates, possibly compromising speech naturalness, to meet timing constraints?
- Lip sync – How closely do the voice actors’ words match visible mouth movements of the original actors?
- Translation quality – How much will dubbers reduce translation accuracy (i.e., adequacy and fluency) to meet other constraints?
- Source influence – Do source speech traits influence the target in ways not mediated by the words of the dub, indicating emotion transfer?
The study focused on two language pairs (EN-DE; EN-ES). In future work, the authors hope to analyze more distant language pairs, such as English–Chinese or English–Arabic, as well as non-English source material.
The authors pointed out, “Our analysis has shown that isometry is a poor proxy for isochrony in human dubs, yet several prior works have claimed that isometric MT benefits automatic dubbing. In future work, we hope to perform analysis to understand this discrepancy.”