Voiseed CEO on Why Emotion is the Key Piece in AI Dubbing

At SlatorCon Remote December 2025, attendees heard from Andrea Ballista, the CEO and Co-founder of the AI-dubbing company Voiseed, as he explored the state of AI dubbing in 2025, and the future of the AI voice space in a presentation titled “Is This Really the Year of AI Dubbing?”

Ballista opened with an overview of the whirlwind of activity in the AI dubbing space in 2025. He noted examples like YouTube’s expansion of AI dubbing, as well as the rollout of their “Multi-Language Audio” feature, allowing users to upload their own human or AI dubs.

Other mentions included Meta’s rollout of AI dubbing for Facebook and Instagram, Google’s advances in speech-to-speech translation (S2ST) features, and Zoom announcing real-time S2ST, among others.

Broadly speaking, he observed a variety of trends in AI voice technology, including an expansion of text-to-speech, speech-to-speech, and lip-syncing solutions, as well as a “growing focus on real-time applications.”

He further noted that, “AI dubbing is becoming a core segment in a larger space that is [the] AI voice generator market.” According to Ballista, “multiple sources are projecting a market size of 20 billion in 2030 with an impressive growth rate of approximately 30% from 2025 to 2030.”

So, has 2025 been the year of AI dubbing? Ballista’s answer was “yes.” However, he elaborated that the AI voice sector is growing beyond dubbing, saying, “it’s not just AI dubbing. Voice is now taking the lead in multiple use cases for conferences, interpreting, voice agents, media, entertainment, and gaming.”

In such a fast-evolving environment, Language Solutions Integrators (LSIs) and Language Technology Platforms (LTPs) are hard-pressed to keep up with the pace of change.

Ballista noted a need for upskilling in the AI voice sector, saying, “LTP outputs’ quality is improving rapidly, and pushing LSIs to adapt to stay efficient. So, LSIs need to have new skills and competence internally to become, and remain, experts.”

He further highlighted the need for “balancing quality, cost and time, [and] establishing and maintaining the appropriate mix of AI tool workflow and human involvement,” in order to succeed in the AI voice sector going forward.

“It’s not just AI dubbing. Voice is now taking the lead in multiple use cases for conferences, interpreting, voice agents, media, entertainment, and gaming.” — Andrea Ballista, CEO and Co-founder, Voiseed

Developing a “Symbolic Emotional Compass”

While discussing the need to continue improving AI voice solutions, Ballista noted that, “current AI systems basically transfer expressions across languages, but often they are missing linguistic and cultural nuances.” He posed the question, “how can we talk to an AI voice model to get the best emotional take in the shortest time possible?”

Taking inspiration from existing frameworks for emotional modeling, Ballista then described an approach for using a “symbolic emotional compass” to “serve as [a] universal language-independent symbolic guide for voice delivery.”

To illustrate the idea, he made a comparison to music, noting that written music uses symbolic elements to guide musicians and “help performers to convey the composer’s original intent.” Extending the analogy, he noted that musical symbols are not limited to one instrument but can be applied universally by performers on any instrument.

While acknowledging that speech synthesis markup language (SSML) “has also a similar symbolic approach,” Ballista highlighted that, “currently it still lacks updates to capture these speech emotional nuances.”

“Current AI systems basically transfer expressions across languages, but often they are missing linguistic and cultural nuances.” — Andrea Ballista, CEO and Co-founder, Voiseed

Looking to the future

Despite the rapid evolution of the AI voice sector, Ballista also touched on some challenges that remain unsolved.

One example was the ongoing focus on latency. He noted, “we can see that real-time features will focus on low latency continuous attention.”

Giving the example of a user interacting with a conversational voice agent, he said, “when you are talking with the voice agent, and you are starting to interrupt the voice agent, […] they stop, and they pay attention to what you’re saying and redefine the answer.”

Going beyond the technical, Ballista also addressed the evolving legalities around AI dubbing. He stated that, “ethical and legal frameworks are becoming more and more relevant since actor performances and related assignment of rights are still lacking a broadly accepted solution.”

He noted, however, that voice-cloning is moving towards a “consent-control-compensation” model as the expectation of obtaining consent to clone an actor’s voice has increasingly become the norm.

Voiseed CEO on Why Emotion is the Key Piece in AI Dubbing

Developing a “Symbolic Emotional Compass”

Looking to the future

Featured

Boost Language Access

Leading with Excellence

memoQ Translation Tech

AI should speak every language