Apple’s work on synthesizing voices could eventually be incorporated into an in-house dubbing process around original content for Apple’s entertainment offering, Apple TV+. Apple has reportedly budgeted billions of dollars for the streaming service.
Another investment that might pave the way for other use cases is Apple’s 2020 acquisition of Voysis, an AI startup whose platform enables retailers to add voice to their websites and mobile apps.
Slator 2021 Video Localization Report
45-pages on subtitling, dubbing, RSI, and captioning for media & entertainment, training & education, meetings & events.
Apple’s recent research suggests other possible directions for voice producers’ work, namely speech-to-speech translation and bilingual text-to-speech, in which a monolingual voice is “taught” to speak a second language.
Naturally, competitors are also hard at work on voice synthesis research, right up to aspirational improvements to lip movement sync — that is, matching a speaker’s lip movements to translated audio (e.g., Synthesia).
In 2019, Google debuted Translatotron (Pro), a proof-of-concept, speech-to-speech translation system that skips the traditional text translation step. A November 2020 paper on Google’s work with AI lab DeepMind introduced a system for “large-scale multilingual audiovisual dubbing.” Amazon, meanwhile, explored automated English to Italian dubbing in a January 2020 study (Pro).
Pro Guide: Translation Pricing and Procurement
45 pages on translation and localization pricing and procurement, human-in-the-loop models, and linguist compensation.
Since slots for voice producers are open in several languages, it stands to reason that Apple may have set its sights on adding new language combinations or voice “styles” to Siri’s repertoire. Past voice producer postings sought speakers of Arabic, French, Russian, Spanish, and Turkish. (Siri is currently available in 21 languages, although its neural text-to-voice feature is an option in fewer locales.)
In some ways, the ideal creative producer for voice is a jack of all trades. They will hire and manage personnel, such as talent coaches and script supervisors, and coordinate with a variety of colleagues, including writers, translators, engineers, and marketing experts. Voice producers will also be expected to master a number of tools for specific technical work: tracking production progress, validating and improving dialogue translations, and verifying and correcting pronunciation.
Who might fit the bill for such a multifaceted role? Apple is especially interested in hearing from professionals with experience directing audio and video productions; previous relationships with production and post-production houses; knowledge of phonetics and linguistics; and, depending on the specific job post, native fluency in Cantonese, Italian, Japanese, or Mandarin.