sync. Unveils Sneak Peek of Upcoming Lip Sync Model at SlatorCon

At SlatorCon Silicon Valley 2025, Prajwal Renukanand, Co-Founder and Chief Scientist at sync., presented the key developments of lip sync technology and gave insight into what the future holds for AI dubbing and video localization.

Prajwal described how 2019 was “actually a big moment, because that was when voice cloning could be done very well. And that was a big moment for translation as well, because when you want to do a translation, you want to preserve the [original speaker’s] voice. Before this, humans always [did] it better. The shift is not just a question of cost and time savings, it’s also a question about engagement and user experience in general,” he said.

Until that point, lip sync technology was also difficult to scale. Prajwal told the SlatorCon audience that “you would train [a model] on hours of video of a specific person, and you [would] get one model for each scene, [but] it only works for cases that you’ve trained for. The dream was [to have] a single model for every speaker [regardless of] language or voice.”

“Lip sync was the missing piece in the puzzle,” added Prajwal, who then showed the SlatorCon audience a video of Princess Diana — one of the first lip-synced videos using an out-of-the-box model with no pre-training, back in 2019. While the technology was revolutionary at the time, there was still a long way to go in terms of accuracy and authenticity.

Fast forward to 2025, and Prajwal painted a picture of how far lip sync has come. Prajwal’s company, sync., recently launched lipsync-2 — a zero-shot model for generating realistic lip movements that match spoken audio.

“One of the things that really led to the popularity of lipsync-2 is that it was the first time a model [could generate] mouth movements matching the actor’s style without actually training on each actor,” he said.

Prajwal showed the SlatorCon audience a clip of Nicholas Cage speaking in multiple languages, using Cage’s own voice, emotion, and intonation in each target language. Prajwal described how Cage’s mouth “moves like how he would speak [in the target language] rather than how anyone else would speak.”

“This breakthrough is crucial for translation because you want to keep the same style, but you just want to change the language. And that’s why lipsync-2 became really popular, because it had this key differentiator of preserving the speaker’s style,” he added.

sync. released a pro version of the lipsync-2 model last week, which Prajwal described as “near studio grade” as it works in high resolution, and supports facial features such as beards and teeth.

On the subject of facial features, Prajwal gave the SlatorCon audience an exclusive sneak peek of sync’s upcoming model to be released later this month, which enables users to edit emotions and head movements to match the new audio track in an existing video. Prajwal showed the audience how the video editing model completely changed the speaker’s mood from happy to sad, as if the clip was originally filmed with those emotions.

Prajwal said that “video translation is going through a transitional phase. From where I see it, it’s like black and white TVs in the 1950s. People started [buying] color TVs when consumers and creators realized that [color TV gave a] much better viewing experience. [Lip synced video] leads to increased costs, but it leads to a much more engaging experience.” 

“Once people see enough dubbed videos, there is no question that there is no going back after that. I believe it is the future,” he concluded.