Matt Panoussis, Co-founder at LipDub AI, added that while his company’s original focus was to make Hollywood dubs look real, “we were so pleasantly surprised to see so many other industries reaching out, looking for high-quality localization. In terms of where we see the most growth, there’s advertising and marketing where we’re doing everything from organic paid ads online to TV broadcast spots.”
Adapt Global Studios’ Founder and CEO Justin Beaudin told the SlatorCon audience that it’s about “making [content] better with AI, not just faster and cheaper,” stating that “about half of our customers are traditional studios” and that others include YouTube creators with a healthy back catalog.
Guy Piekarz, CEO at Panjaya, agreed with Beaudin, saying “we actually think that it’s not about saving money and making things faster. We can create experiences [like] never before. Overall, I think there’s a big unlock [in the market] in the next 18 months across industries.”
Need for Experts in the Loop
Slator’s Alex Edwards asked the panelists if they ever provide zero-shot AI dubs to clients, and if experts are needed in the AI dubbing process.
Adapt’s Beaudin confirmed that “you need to craft the dubs,” to which Lipdub’s Panoussis added, “it depends on the content type and what the expectations of the customer are. When we work with social media, we find more often than not that [clients] are actually okay with the quality of what you get with the combination of speech-to-speech, voice cloning, and lip sync.”
Panoussis added, “Of course, if perfection is the goal, at least with the current state of audio technology, you need a human in the loop that speaks the target language, especially when you’re dealing with entertainment content where more slang and idioms are used when working with informational content.”
RWS’s Thomas agreed: “I would echo exactly that. It’s directly analogous to what we saw with machine translation in the past, where it works great by itself in very limited types of content but once you put a human in a loop you could actually achieve really good results that were as good as human-only.”
Panjaya’s Piekarz went back to Panoussis’ point: “[quality is] context related. [With some content,] you can truly be there in just one shot. We do sports-related behind-the-scenes interviews one shot and it works.”
Evaluating Quality Output of AI Dubbing
Slator’s Edwards asked panellists how they evaluate the output of their own AI dubbing solutions, and what clients can do to assess the quality of output for multilingual video across providers.
Panjaya’s Piekarz said that his company has “created a universe of hundreds of videos that are benchmarked” and which are used to test every release using human evaluators. “As for the customers, they have their own language [experts] to review output as well until they create more trust [with the provider].”
Adapt’s Beaudin called on the industry to develop a framework around quality assessment for AI dubbing. “It has always been hard to judge what the quality of a good dub is,” he said, explaining that quality is often subjective depending on the end user, suggesting that buyers could provide more guidance on what constitutes good quality for them.
On the subject of buyer expectations, Lipdub’s Panoussis added, “We have certain buyers that take our software and they do not review the results, and they’re very happy with what they get on the other side. We have other buyers that want to roll their sleeves up, and in those cases, we provide them with tools to make the process of getting to perfect as easy as possible.”
RWS’s Thomas added that “as a collective, we typically focus on linguistic quality as the primary measure of quality. Personally, I feel that the quality of the outcome is the true measure. So why are you producing a video in the first place? Why are you deciding that you want to dub that video? In theory, it’s because you have a particular audience and you want them to have some kind of understanding after watching that video, and that’s the only true measure of quality. And whatever your approach is to that video, if it succeeds at that quality of outcome, job done.”
Demand for Lip Sync in AI Dubbing
On the subject of lip sync demand for AI dubbed content, Panoussis said, “what’s been really amazing is to see the data finally come out from early adopters. We always had the conviction that whatever metrics mattered to you, whether it be watch time, completion, time, engagement, just general video performance would be dramatically improved with a combination of incredible audio dubbing, voice cloning, and lip sync.”
Panoussis added that more and more customers experience lip syncing and are able to relay the success stories and data. “The demand, as a result, has been exploding,” he concluded.
Panjaya’s Piekarz added that the user experience of watching lip-synced video feels “natural and authentic,” which he agreed “feels like night and day” compared to non-lip-synced video.
The Future of AI Dubbing
Slator’s Edwards asked panelists for a fire-round prediction on the future of AI dubbing. “I would say automation,” said Piekarz. “The better out-of-the-box results we can get, the less work we need to put into it, and the bigger unlock we’ll get,” he concluded.
Lipdub’s Panoussis added, “In the next ten years, I really believe that every single video on the internet is going to be localized in this manner. That isn’t to say that subtitles will go away, but I believe every video is going to be localized.”
Adapt’s Beaudin agreed: “voice cloning and lip sync is fundamentally better, so it’s just going to happen. Better wins out eventually. As to the timing, hopefully soon.”
As for RWS’s Thomas, “the future is multimodal and the future is absolute flexibility and democratization of access. Do you want to watch this video in [your language] or with subtitles [in your language]?”
“That’s the big thing that’s coming. We’re on the cusp,” he concluded.