Here’s Europe’s Latest Attempt at End-to-End Speech Translation

If there were ever a shortlist of projects that had the potential to produce a Babel Fish-type translation device, this would probably be on it.

Backed by the European academe, private sector, and government, the project is called ELITR (pronounced “eh-lee-ter”), also known as European Live Translator. The project was born out of the need to provide subtitles for a EUROSAI Congress back in May.

EUROSAI is the European Organization of Supreme Audit Institutions; and the Supreme Audit Office of the Czech Republic initiated the project to help translate speeches in real-time from six source languages into 43 targets: 24 EU languages, plus 19 EUROSAI languages (e.g., Armenian, Russian, Bosnian, Georgian, Hebrew, Kazakh, Norwegian, Luxembourgish).

In an ELITR demo video, Charles University Associate Professor, Ondřej Bojar, said the project also looks into the possibility of “going directly from the source speech into the target language with an end-to-end spoken language translation system.”

In short, speech-to-speech translation (S2ST). For ELITR, however, Bojar told Slator, “We stop at the target text. We are not including the final text-to-speech — although we definitely could.”

Slator 2021 Data-for-AI Market Report

44-pages on how LSPs enter and scale in AI Data-as-a-service. Market overview, AI use cases, platforms, case studies, sales insights.

$380 BUY NOW Included in our Pro and Enterprise plan.
Subscribe now!

S2ST has become a sort of brass ring in research and big tech — as tackled by the likes of Apple, Google (via the so-called “Translatotron”; SlatorPro), and prominent Japanese researchers, who uploaded a toolkit for it on GitHub. Chinese search giant Baidu even drew some flack for claims around it; and, of course, there is a whole graveyard of translation gadgets from companies that tried to commercialize S2ST.

Admittedly, ELITR’s production pipeline currently relies on two independent steps — that is, automatic speech recognition (ASR) and machine translation (MT) and, according to Bojar “we are actually quite good in these two steps” (as evidenced by a paper published on June 17, 2021; and two others published in September and October 2020).

“We’re also investigating the possibilities of going directly from the source speech into the target language with an end-to-end spoken language translation system” — Ondrej Bojar, Associate Professor, Charles University

End-to-end speech translation is part of the long-term vision, as outlined in a recent paper published on the Association for Computational Linguistics portal. “The goal of a practically usable simultaneous spoken language translation (SLT) system is getting closer,” wrote the authors from Charles University, Karlsruhe Institute of Technology, the University of Edinburgh, and Italy-based automatic speech recognition (ASR) provider PerVoice. SLT also encompasses off-line spoken language systems, the authors said.

The authors (Bojar, among them) mentioned two problems of the current system that have yet to be solved.

Intonation – which cannot be factored in as punctuation prediction has no access to sound; and
Segmentation errors – that is, MT systems tending to “normalize word order,” thus reducing fluency in a stream of spoken sentences.

Hence, “for the future, we consider three approaches,” Bojar, et al. added: (1) training MT on sentence chunks, (2) including sound input in punctuation prediction, or (3) end-to-end neural SLT.”

Working alongside Charles University on ELITR were the University of Edinburgh, and Karlsruhe Institute of Technology. ASR provider, PerVoice, and Germany-based video conferencing platform, alfaview, also participated in the project. Does this mean commercialization plans are on the drawing board?

Bojar told Slator, “For a research institute at a university, commercialization is always something that takes an unbearably long time, but we are definitely very open to many forms of collaboration.”

Featured

Partner spotlight

Boost Language Access

Improve health outcomes and ensure compliance for individuals with LEP

Watch the webinar

Partner spotlight

AI should speak every language

Support linguists building tools that serve marginalized communities.

Donate now

Partner spotlight

memoQ Translation Tech

Enterprise-Grade, AI-Powered and Secure Localization Management for Teams

Discover memoQ

Partner spotlight

Leading with Excellence

globalese by memoQ | 2025 CODiE Award winner for Best Machine Translation.

Here’s Europe’s Latest Attempt at End-to-End Speech Translation

Slator 2021 Data-for-AI Market Report

SlatorCon London 2026

Featured

Boost Language Access

AI should speak every language

memoQ Translation Tech

Leading with Excellence