WIPO was an early adopter of custom neural machine translation, which played a significant role and was a “major driver” that fueled the exploration of S2T technology to meet WIPO’s specific needs.
The authors also explained that this approach of cascading S2T with MT was chosen because it outperformed the end-to-end speech-to-translated-text approach. The cascading approach was chosen because WIPO had access to highly performant MT models customized for the meetings domain. These models were trained using WIPO’s own meeting data and related documents. Furthermore, WIPO lacked sufficient training data for the end-to-end speech translation approach.
Data scarcity presented a challenge not only for the end-to-end speech translation approach but also for the speech-to-text development. To address this the team collaborated with other international organizations to leverage historical meetings data, contracted external providers to transcribe WIPO in-domain audio, and acquired out-of-domain proprietary corpora. This ensured that WIPO’s S2T and MT components were well-tailored to the language used in international organization meetings.
Open-Source, On-Prem
The authors evaluated the system using automatic metrics such as Word Error Rate (WER) and BLEU for S2T and MT, as well as business-oriented metrics like fitness for purpose, turnaround time, user experience, and cost savings.
Despite occasional errors in the produced texts, WIPO reports that users have overwhelmingly embraced the system due to its rapid availability, convenience, and multilingual support.
User feedback has indicated that the benefits, including reduced turnaround time and cost savings, outweigh the drawbacks. This adoption of technology also aligns with WIPO’s policy for increased digitization and has improved working methodologies.
The system’s deployment aligns with WIPO’s strong data security and privacy policies, as it is installed on-premises to handle confidential meetings. The authors emphasized, “our solution, based on open-source tools, is installed on-premises, allowing us to meet our strong data security and privacy policies, and is even fit for our confidential meetings.”
Slator Pro Guide: Subtitling and Captioning
Pro Guide for buyers and LSPs on how to leverage captions and subtitles for video content to grow viewership and improve engagement. Features 10 x 1-page use cases.
After a year-long pilot phase for essential meetings, WIPO’s General Assemblies and many other international organizations — such as the United Nations office of Geneva, the International Labour Organization, the World Trade Organization, and the European Union Court of Justice — have adopted this system, replacing manually prepared verbatim reports.
WIPO has also experimented with OpenAI’s Whisper models, focusing initially on S2T and planning to explore the translation feature in the future. Customization of pre-trained models using in-domain data has been a part of their strategy to improve performance, particularly in recognizing domain-specific terminology.
Looking ahead, WIPO aims to continue improving transcript quality, expanding language support, and exploring different pathways to generate transcript languages.
Authors: Akshat Dewan, Michal Ziemski, Henri Meylan, Lorenzo Concina, Bruno Pouliquen