On December 19, 2016, a Monday, at exactly half past nine, the Twitterverse was alerted to the existence of the OpenNMT project over at the Harvard natural language processing (NLP) group.
The Harvard NLP group comprises researchers who cover areas as varied as “computational models for human language,” machine learning, deep learning, artificial intelligence, and the “intersections between computer science and linguistics.”
The group’s OpenNMT tweet was followed the day after with a wink at Google, which read: “#Google, we promise we are not#taking you on. Please keep on putting out awesome research / feeding my grad students.”
Partner spotlight
How teams localize with AI.
Browse a full day of sessions built to drive results this quarter.
OpenNMT developer Yoon Kim is a Computer Science PhD candidate and member of Harvard NLP. Kim had previously taken his Master’s in Data Science from New York University, another Master’s in Statistics from Columbia University, and baccalaureate in Math and Economics from Cornell.
Working on the project with Kim was his adviser, Alexander Rush, who runs the NLP group. Commercial machine translation provider Systran, which recently launched its own proprietary neural machine translation system, was also involved in the project.
What follows is Slator’s interview with Harvard NLP’s Alexander Rush and Systran CTO Jean Senellart on the OpenNMT project.
Slator: What motivated you to develop OpenNMT? How did this project come about?
Alexander Rush: The project is based on research software built by my graduate student Yoon Kim. We used the software in my lab to do research on improving translation systems and to teach graduate students. We happened to also put the software online for free, and Systran found it. It was useful for their products, and so they begin to send us updates to the code. It is the kind of mutually beneficial relationship that open-source communities can produce.
Alexander Rush
Slator: What exactly is OpenNMT and what does it do?
Rush: Recently, there have been a series of advances in artificial intelligence (AI), leading to improvements in speech, image recognition, and game playing. In the area of natural language processing, these improvements have been most impactful in the area of translation, leading to models that significantly improve on the quality of machine translation.
OpenNMT is open-source software implementing this technology, roughly similar to Google’s proprietary system. It is software to learn models for machine translation. It takes in a corpus of aligned sentences from a source and target language, and learns a mathematical model—known as a neural network—to [perform] translation. That model can then be fed unseen source sentences and OpenNMT will translate them.
We do expect some competitors quickly building products based on this technology—Jean Senellart, Systran CTO
Slator: What makes it different from the commercial solution Systran offers?
Jean Senellart: The core technology we propose to our users will be exactly the same as the one we are contributing for the OpenNMT project. Our business model is to build tailored
2026 Slator Index (Data as a Spreadsheet Download)
Spreadsheet with underlying data for the Slator 2026 Index: ca. 300+ LSIs and LTPs, 2025 revenues (USD), growth, ownership, headquarters, and more.
This 160-page Slator Report provides a comprehensive view of the emerging global market for Data-for-AI with analysis of datasets, buyer demand, supplier dynamics, and data production.
The 150-page report offers a comprehensive view of the 2025 global market — with market sizing, AI capability breakdowns, buyer insights, use cases, survey data, and projections through 2030.
The Slator Translation as a Feature (TaaF) Report is a vital and concise guide on how AI translation is becoming an integral feature in enterprise technology.
The Slator Pro Guide: Audiovisual Translation is a concise guide to audiovisual translation, including dubbing, subtitling, access services, AI dubbing, AI captions, and more.
Slator Pro Guide: ISO Certifications for LSPs in the Era of AI
This 40-page guide examines ISO certifications for translation, interpreting, quality management, information security, AI, and machine learning, providing LSPs with a roadmap for certification.
The 50-page report provides a 360-degree view of interpreting technology and AI-enabled solutions for interpreting and real-time multilingual communication.
Use our Weekly Newsletter to get your message in front of 16,000 opt-in subscribers. Includes image, 30 character headline, and 25 character call-to-action.
Pro Guide for buyers and LSPs on how to leverage captions and subtitles for video content to grow viewership and improve engagement. Features 10 x 1-page use cases.
100-page report on the fast-growing game localization vertical. Includes market size, game development, language tech, GaaS, and in-depth localization process guide.
Slator Pro Guide: Finding Growth in Adjacent Services
Pro Guide for LSPs to identify growth opportunities in adjacent services. Profiles 15 adjacent services, analyzing barriers to entry, risks, and synergies.
60-page report on the interaction between human experts and AI in translation production, including AI-enabled workflows, adoption rates, postediting, pricing models.
Slator Interpreting Services and Technology Report
60-page report on the growth industry of interpreting, featuring analysis by mode, setting, geo, buyers, business use cases, RSI, OPI, VRI. Incl. market size estimate.
Slator 2022 Language Service Provider Index (All Data as a Spreadsheet Download)
Spreadsheet with underlying data for the Slator 2022 LSPI: ca. 300 LSPs, 2021 and 2020 revenues (USD and original currency), growth, ownership, headcount, headquarters, and more.
Slator 2021 Language Service Provider Index (All Data as a Spreadsheet Download)
Spreadsheet with underlying data for the Slator 2021 LSPI: 190+ LSPs, 2020 and 2019 revenues (USD and original currency), growth, ownership, headquarters, and more.
Published March 2018. 35-page report. Current state and business case for NMT with expert commentary from over a dozen industry experts and academic researchers.
16-page report. Analysis of 2017 language industry M&A, 2018 outlook, list of all deals Slator covered incl. price, multiples if available, sector, country, deal type.
Features 30 buyer profiles along industry verticals incl. buyer name, translation volume and / or spend, technology used, sourcing approach, other key insights.
for our customers. [We] provide complete translation workflow; more features (e.g., document filtering, coupling with other technologies like language detection, entity extraction) than just the core translation.
Slator: Can you give us a simple first use case for OpenNMT?
Rush: We released several example translation models (e.g., German-English). Anyone can download and run the model to experiment with neural machine translation. We publicized the project because we thought it was quite stable; but also with the hope that more people in the translation community would contribute back to further improve it.
In theory, anybody could rent a server and train a model on available data, and we see some hobbyist doing just that—Alexander Rush, Assistant Professor Harvard School of Engineering and Applied Sciences
Slator: What is your mid- to long-term goal for OpenNMT?
Rush: There are two main focuses. One, we want to keep the code up-to-date with all the new ideas published in the research community, such that the open-source software stays competitive with closed-source offerings (e.g., Google). For instance, my group recently developed a system for shrinking translation models so they can run much faster, and this was implemented in the software even before the paper was published.
Jean Senellart
Two, we want to try out more cutting-edge “translation” ideas. For example, we are implementing an extension to map from images-to-text using OpenNMT. This is a rather recent research idea that we hope to make more accessible.
Senellart: On Systran’s side, we want this project to contain all the best of breed features and ideas that are published by the research community, but also keep the code simple, fast, so it becomes a reference for anyone wanting to do more research or even create commercial applications.
Slator: Who do you see as early adopters of this technology?
Rush: Great question! In theory, anybody could rent a server and train a model on available data, and we see some hobbyist doing just that. In practice, we expect a mix of researchers studying how to improve translation and people in the industry looking to become familiar with new AI technology.
Senellart: We do expect some competitors quickly building products based on this technology—and this will, of course, be challenging for us. But at the same time, [it is] quite an achievement that will help develop the machine translation market and global awareness about the technology.
Featured
Partner spotlight
Boost Language Access
Improve health outcomes and ensure compliance for individuals with LEP